Courses × Browse Corporate Training All Courses

Big Data and Data Science Master's Course

Master Program
4.8 509 Ratings 2,618 Learners

Our Big Data and Data Science master’s course lets you gain proficiency in Big Data and Data Science. You will work on real-world projects in Hadoop Dev, Admin, Test, and Analysis, Apache Spark, Scala, AWS, Tableau, Artificial Intelligence, Deep Learning, Python for Data Science, SAS, R, Splunk Developer and Admin, NoSQL databases, and more. In this program, we will cover 20 courses and 56 industry-based projects.

In Collaboration with course image
  • 20+

    Courses

  • 56+

    Projects

  • 322

    Hours

What you will Learn 19 Courses

  • Online Classroom Training

    • Course 1
      Big Data Hadoop & Spark
    • Course 2
      Apache Spark & Scala
    • Course 3
      Data Science with R
    • Course 4
      Python for Data Science
    • Course 5
      Tableau Desktop 10
    • Course 6
      Splunk Developer & Admin
    • Course 7
      Data Science with SAS
    • Course 8
      AI Deep Learning Course
    • Course 9
      MongoDB
    • Course 10
      AWS
    • Course 11
      Microsoft Azure Training
  • Self-paced Training

    • Course 12
      Apache HBase
    • Course 13
      Apache Cassandra
    • Course 14
      Couchbase
    • Course 15
      Machine Learning
    • Course 16
      Solr
    • Course 17
      Linux & Java
    • Course 18
      Apache Kafka
    • Course 19
      SQL
  • Get Master's Certificate

Key Features

322 Hrs Instructor Led Training
381 Hrs Self-paced Videos
528 Hrs Project work & Exercises
Certification and Job Assistance
Flexible Schedule
Lifetime Free Upgrade
24/7 Support & Access

Course Fees

Self Paced Training

  • 381 Hrs e-learning videos
  • Lifetime Free Upgrade
  • 24/7 Support & Access
$1,755

Online Classroom preferred

  • Everything in self-paced, plus
  • 322 Hrs of Instructor-led Training
  • 1:1 Doubt Resolution Sessions
  • Attend as many batches for Lifetime
  • Flexible Schedule
  • 31 Oct
  • SAT - SUN
  • 08:00 PM TO 11:00 PM IST (GMT +5:30)
  • 03 Nov
  • TUE - FRI
  • 07:00 AM TO 09:00 AM IST (GMT +5:30)
  • 07 Nov
  • SAT - SUN
  • 08:00 PM TO 11:00 PM IST (GMT +5:30)
  • 14 Nov
  • SAT - SUN
  • 08:00 PM TO 11:00 PM IST (GMT +5:30)
$ 3779 $2,018 10% OFF Expires in

Corporate Training

  • Customized Learning
  • Enterprise grade learning management system (LMS)
  • 24x7 support
  • Strong Reporting

Overview

Intellipaat’s Big Data and Data Science master’s course will provide you with in-depth knowledge of designing, developing, and deploying Data Science and Big Data applications in the real world, along with performance tuning of applications. This course will make you a Big Data and Data Science architect, and by the end of the course, you will have expertise on Hadoop Developer, Administration, Testing, and Analysis modules, working with real-time analytics, statistical computing, parsing machine-generated data, creating NoSQL applications, and finally in the domain of Deep Learning in Artificial Intelligence. This program is specially designed by industry experts, and you will get 20 courses with 56 industry-based projects.

List of Courses Included:

Online Instructor-led Courses:

  • Big Data Hadoop and Spark
  • Apache Spark and Scala
  • Data Science with R
  • Python for Data Science
  • Tableau Desktop 10
  • Splunk Developer and Admin
  • Data Science with SAS
  • Artificial Intelligence and Deep Learning Course with TensorFlow
  • MongoDB
  • AWS
  • Microsoft Azure Training

Self-paced Courses:

  • Apache HBase
  • Apache Cassandra
  • Couchbase
  • Machine Learning
  • Solr
  • Linux
  • Java
  • Apache Kafka
  • SQL
  • Introduction to Hadoop
  • Detailed MapReduce and HDFS
  • Hive, Pig, Sqoop, Flume and Apache HBase
  • Real-time analytics with Spark
  • Prediction and analysis through clustering
  • Deploying recommender system
  • SAS advanced analytics and R programming
  • Linear and logistic regression
  • Designing and Developing NoSQL applications
  • Mastering Artificial Intelligence Algorithms and their practical use cases
  • Big Data and Data Science Professionals and Software Developers
  • Business Intelligence Professionals, Information Architects and Project Managers
  • Those looking to make a career in Big Data and Data Science

There are no prerequisites for taking up this training program.

  • Global Big Data market to reach $122 billion in revenue by 2025 – Frost & Sullivan
  • The US alone would face a shortage of 1.4–1.9 million Big Data Analysts in the next two years – McKinsey

This Intellipaat training program has been created keeping in mind the needs of the industry. You will gain mastery in the complete aspects of Data Science and Hadoop ecosystem to take on various roles and responsibilities in the Big Data and Data Science domains at top-notch salaries.

View More

Talk To Us

Big Data Hadoop Training Review

John Chioles

Ritesh Bhagwat

Mr Yoga

Dileep & Ajay

Sagar

Ashok Guntupalli

intellipaat-avatar

Kavita Mehra

Hadoop Developer at TCS

The classes were highly interactive and also practical oriented. The office staff was cordial and co-operative. Every teaching session was recorded each day and was put on-line by the institute which was really helpful. The trainer was very patient and able to solve or give some hints to solve all the questions posed to him.

intellipaat-avatar

Sameer Gupta

Business Intelligence Consultant at IBM

I enjoyed this course from the very first session. The content guides you from the very basic approach of the fundamentals to the advanced level with practical knowledge in just a few days of training

Vikrant Singh

Big Data Analytics

It was a wonderful experience and learning from Intellipaat trainers. The trainers were hands-on and provided real-time scenario's. For learning cutting-edge and latest technologies Intellipaat is the right place

intellipaat-avatar

Abhimanyu Balgopal

Product Engineer (BigData)

As a Big Data Engineer, this masters course doubled my interest in various technologies specifically Hadoop, Spark, Storm Scala and others. Very beneficial for those who are passionate about Big Data and Data science and want to learn everything at one place. The courses in this Master are well constructed and each topic is well explained. Thanks Intellipaat!!!

intellipaat-avatar

Narendra Kumar

Data Scientist at PropTiger.com

Nothing better than a master like this! Being a Data Scientist, I could gain insights into various Big Data platforms like Hadoop, Spark and Scala, which has really enriched my skill set and gave me an edge amongst coworkers. The idea of learning the most demanding Big Data and Data Science technologies through a single course is just wonderful. The trainers are doing a great job. I just love the Integration of various technologies together.

Course Content

Module 01 - Hadoop Installation and Setup

1.1 The architecture of Hadoop cluster
1.2 What is High Availability and Federation?
1.3 How to setup a production cluster?
1.4 Various shell commands in Hadoop
1.5 Understanding configuration files in Hadoop
1.6 Installing a single node cluster with Cloudera Manager
1.7 Understanding Spark, Scala, Sqoop, Pig, and Flume

Module 02 - Introduction to Big Data Hadoop and Understanding HDFS and MapReduce

2.1 Introducing Big Data and Hadoop
2.2 What is Big Data and where does Hadoop fit in?
2.3 Two important Hadoop ecosystem components, namely, MapReduce and HDFS
2.4 In-depth Hadoop Distributed File System – Replications, Block Size, Secondary Name node, High Availability and in-depth YARN – resource manager and node manager

Hands-on Exercise:

1. HDFS working mechanism
2. Data replication process
3. How to determine the size of the block?
4. Understanding a data node and name node

Module 03 - Deep Dive in MapReduce

3.1 Learning the working mechanism of MapReduce
3.2 Understanding the mapping and reducing stages in MR
3.3 Various terminologies in MR like Input Format, Output Format, Partitioners, Combiners, Shuffle, and Sort

Hands-on Exercise:

1. How to write a WordCount program in MapReduce?
2. How to write a Custom Partitioner?
3. What is a MapReduce Combiner?
4. How to run a job in a local job runner
5. Deploying a unit test
6. What is a map side join and reduce side join?
7. What is a tool runner?
8. How to use counters, dataset joining with map side, and reduce side joins?

Module 04 - Introduction to Hive

4.1 Introducing Hadoop Hive
4.2 Detailed architecture of Hive
4.3 Comparing Hive with Pig and RDBMS
4.4 Working with Hive Query Language
4.5 Creation of a database, table, group by and other clauses
4.6 Various types of Hive tables, HCatalog
4.7 Storing the Hive Results, Hive partitioning, and Buckets

Hands-on Exercise:

1. Database creation in Hive
2. Dropping a database
3. Hive table creation
4. How to change the database?
5. Data loading
6. Dropping and altering table
7. Pulling data by writing Hive queries with filter conditions
8. Table partitioning in Hive
9. What is a group by clause?

Module 05 - Advanced Hive and Impala

5.1 Indexing in Hive
5.2 The ap Side Join in Hive
5.3 Working with complex data types
5.4 The Hive user-defined functions
5.5 Introduction to Impala
5.6 Comparing Hive with Impala
5.7 The detailed architecture of Impala

Hands-on Exercise: 

1. How to work with Hive queries?
2. The process of joining the table and writing indexes
3. External table and sequence table deployment
4. Data storage in a different table

Module 06 - Introduction to Pig

6.1 Apache Pig introduction and its various features
6.2 Various data types and schema in Hive
6.3 The available functions in Pig, Hive Bags, Tuples, and Fields

Hands-on Exercise: 

1. Working with Pig in MapReduce and local mode
2. Loading of data
3. Limiting data to 4 rows
4. Storing the data into files and working with Group By, Filter By, Distinct, Cross, Split in Hive

Module 07 - Flume, Sqoop and HBase

7.1 Apache Sqoop introduction
7.2 Importing and exporting data
7.3 Performance improvement with Sqoop
7.4 Sqoop limitations
7.5 Introduction to Flume and understanding the architecture of Flume
7.6 What is HBase and the CAP theorem?

Hands-on Exercise: 

1. Working with Flume to generate Sequence Number and consume it
2. Using the Flume Agent to consume the Twitter data
3. Using AVRO to create Hive Table
4. AVRO with Pig
5. Creating Table in HBase
6. Deploying Disable, Scan, and Enable Table

Module 08 - Writing Spark Applications Using Scala

8.1 Using Scala for writing Apache Spark applications
8.2 Detailed study of Scala
8.3 The need for Scala
8.4 The concept of object-oriented programming
8.5 Executing the Scala code
8.6 Various classes in Scala like getters, setters, constructors, abstract, extending objects, overriding methods
8.7 The Java and Scala interoperability
8.8 The concept of functional programming and anonymous functions
8.9 Bobsrockets package and comparing the mutable and immutable collections
8.10 Scala REPL, Lazy Values, Control Structures in Scala, Directed Acyclic Graph (DAG), first Spark application using SBT/Eclipse, Spark Web UI, Spark in Hadoop ecosystem.

Hands-on Exercise:

1. Writing Spark application using Scala
2. Understanding the robustness of Scala for Spark real-time analytics operation

Module 09 - Spark framework

9.1 Detailed Apache Spark and its various features
9.2 Comparing with Hadoop
9.3 Various Spark components
9.4 Combining HDFS with Spark and Scalding
9.5 Introduction to Scala
9.6 Importance of Scala and RDD

Hands-on Exercise: 

1. The Resilient Distributed Dataset (RDD) in Spark
2. How does it help to speed up Big Data processing?

Module 10 - RDD in Spark

10.1 Understanding the Spark RDD operations
10.2 Comparison of Spark with MapReduce
10.3 What is a Spark transformation?
10.4 Loading data in Spark
10.5 Types of RDD operations viz. transformation and action
10.6 What is a Key/Value pair?

Hands-on Exercise: 

1. How to deploy RDD with HDFS?
2. Using the in-memory dataset
3. Using file for RDD
4. How to define the base RDD from an external file?
5. Deploying RDD via transformation
6. Using the Map and Reduce functions
7. Working on word count and count log severity

Module 11 - Data Frames and Spark SQL

11.1 The detailed Spark SQL
11.2 The significance of SQL in Spark for working with structured data processing
11.3 Spark SQL JSON support
11.4 Working with XML data and parquet files
11.5 Creating Hive Context
11.6 Writing Data Frame to Hive
11.7 How to read a JDBC file?
11.8 Significance of a Spark data frame
11.9 How to create a data frame?
11.10 What is schema manual inferring?
11.11 Work with CSV files, JDBC table reading, data conversion from Data Frame to JDBC, Spark SQL user-defined functions, shared variable, and accumulators
11.12 How to query and transform data in Data Frames?
11.13 How data frame provides the benefits of both Spark RDD and Spark SQL?
11.14 Deploying Hive on Spark as the execution engine

Hands-on Exercise:

1. Data querying and transformation using Data Frames
2. Finding out the benefits of Data Frames over Spark SQL and Spark RDD

Module 12 - Machine Learning Using Spark (MLlib)

12.1 Introduction to Spark MLlib
12.2 Understanding various algorithms
12.3 What is Spark iterative algorithm?
12.4 Spark graph processing analysis
12.5 Introducing Machine Learning
12.6 K-Means clustering
12.7 Spark variables like shared and broadcast variables
12.8 What are accumulators?
12.9 Various ML algorithms supported by MLlib
12.10 Linear regression, logistic regression, decision tree, random forest, and K-means clustering techniques

Hands-on Exercise: 

1. Building a recommendation engine

Module 13 - Integrating Apache Flume and Apache Kafka

13.1 Why Kafka?
13.2 What is Kafka?
13.3 Kafka architecture
13.4 Kafka workflow
13.5 Configuring Kafka cluster
13.6 Basic operations
13.7 Kafka monitoring tools
13.8 Integrating Apache Flume and Apache Kafka

Hands-on Exercise:

1. Configuring Single Node Single Broker Cluster
2. Configuring Single Node Multi Broker Cluster
3. Producing and consuming messages
4. Integrating Apache Flume and Apache Kafka.

Module 14 - Spark Streaming

14.1 Introduction to Spark streaming
14.2 The architecture of Spark streaming
14.3 Working with the Spark streaming program
14.4 Processing data using Spark streaming
14.5 Requesting count and DStream
14.6 Multi-batch and sliding window operations
14.7 Working with advanced data sources
14.8 Features of Spark streaming
14.9 Spark Streaming workflow
14.10 Initializing StreamingContext
14.11 Discretized Streams (DStreams)
14.12 Input DStreams and Receivers
14.13 Transformations on DStreams
14.14 Output Operations on DStreams
14.15 Windowed operators and its uses
14.16 Important Windowed operators and Stateful operators

Hands-on Exercise:

1. Twitter Sentiment analysis
2. Streaming using Netcat server
3. Kafka-Spark streaming
4. Spark-Flume streaming

Module 15 - Hadoop Administration – Multi-node Cluster Setup Using Amazon EC2

15.1 Create a 4-node Hadoop cluster setup
15.2 Running the MapReduce Jobs on the Hadoop cluster
15.3 Successfully running the MapReduce code
15.4 Working with the Cloudera Manager setup

Hands-on Exercise:

1. The method to build a multi-node Hadoop cluster using an Amazon EC2 instance
2. Working with the Cloudera Manager

Module 16 - Hadoop Administration – Cluster Configuration

16.1 Overview of Hadoop configuration
16.2 The importance of Hadoop configuration file
16.3 The various parameters and values of configuration
16.4 The HDFS parameters and MapReduce parameters
16.5 Setting up the Hadoop environment
16.6 The Include and Exclude configuration files
16.7 The administration and maintenance of name node, data node directory structures, and files
16.8 What is a File system image?
16.9 Understanding Edit log

Hands-on Exercise:

1. The process of performance tuning in MapReduce

Module 17 - Hadoop Administration – Maintenance, Monitoring and Troubleshooting

17.1 Introduction to the checkpoint procedure, name node failure
17.2 How to ensure the recovery procedure, Safe Mode, Metadata and Data backup, various potential problems and solutions, what to look for and how to add and remove nodes

Hands-on Exercise:

1. How to go about ensuring the MapReduce File System Recovery for different scenarios
2. JMX monitoring of the Hadoop cluster
3. How to use the logs and stack traces for monitoring and troubleshooting
4. Using the Job Scheduler for scheduling jobs in the same cluster
5. Getting the MapReduce job submission flow
6. FIFO schedule
7. Getting to know the Fair Scheduler and its configuration

Module 18 - ETL Connectivity with Hadoop Ecosystem (Self-Paced)

18.1 How ETL tools work in Big Data industry?
18.2 Introduction to ETL and data warehousing
18.3 Working with prominent use cases of Big Data in ETL industry
18.4 End-to-end ETL PoC showing Big Data integration with ETL tool

Hands-on Exercise:

1. Connecting to HDFS from ETL tool
2. Moving data from Local system to HDFS
3. Moving data from DBMS to HDFS,
4. Working with Hive with ETL Tool
5. Creating MapReduce job in ETL tool

Module 19 - Project Solution Discussion and Cloudera Certification Tips and Tricks

19.1 Working towards the solution of the Hadoop project solution
19.2 Its problem statements and the possible solution outcomes
19.3 Preparing for the Cloudera certifications
19.4 Points to focus on scoring the highest marks
19.5 Tips for cracking Hadoop interview questions

Hands-on Exercise:

1. The project of a real-world high value Big Data Hadoop application
2. Getting the right solution based on the criteria set by the Intellipaat team

Following topics will be available only in self-paced mode:

Module 20 - Hadoop Application Testing

20.1 Importance of testing
20.2 Unit testing, Integration testing, Performance testing, Diagnostics, Nightly QA test, Benchmark and end-to-end tests, Functional testing, Release certification testing, Security testing, Scalability testing, Commissioning and Decommissioning of data nodes testing, Reliability testing, and Release testing

Module 21 - Roles and Responsibilities of Hadoop Testing Professional

21.1 Understanding the Requirement
21.2 Preparation of the Testing Estimation
21.3 Test Cases, Test Data, Test Bed Creation, Test Execution, Defect Reporting, Defect Retest, Daily Status report delivery, Test completion, ETL testing at every stage (HDFS, Hive and HBase) while loading the input (logs, files, records, etc.) using Sqoop/Flume which includes but not limited to data verification, Reconciliation, User Authorization and Authentication testing (Groups, Users, Privileges, etc.), reporting defects to the development team or manager and driving them to closure
21.4 Consolidating all the defects and create defect reports
21.5 Validating new feature and issues in Core Hadoop

Module 22 - Framework Called MRUnit for Testing of MapReduce Programs

22.1 Report defects to the development team or manager and driving them to closure
22.2 Consolidate all the defects and create defect reports
22.3 Responsible for creating a testing framework called MRUnit for testing of MapReduce programs

Module 23 - Unit Testing

23.1 Automation testing using the OOZIE
23.2 Data validation using the query surge tool

Module 24 - Test Execution

24.1 Test plan for HDFS upgrade
24.2 Test automation and result

Module 25 - Test Plan Strategy and Writing Test Cases for Testing Hadoop Application

25.1 Test, install and configure

Big Data Hadoop Course Projects

Working with MapReduce, Hive, and Sqoop

In this project, you will successfully import data using Sqoop into HDFS for data analysis. The transfer will be from Sqoop data transfer from RDBMS to Hadoop. You will code in Hive query language and carry out data querying and analysis. You will acquire an understanding of Hive and Sqoop after completion of this project.

Work on MovieLens Data For Finding the Top Movies

Create the top-ten-movies list using the MovieLens data. For this project, you will use the MapReduce program for working on the data file, Apache Pig for analyzing data, and Apache Hive data warehousing and querying. You will be working with distributed datasets.

Hadoop YARN Project: End-to-End PoC

Bring the daily incremental data into the Hadoop Distributed File System. As part of the project, you will be using Sqoop commands to bring the data into HDFS, working with the end-to-end flow of transaction data, and the data from HDFS. You will work on a live Hadoop YARN cluster. You will work on the YARN central resource manager.

Table Partitioning in Hive

In this project, you will learn how to improve the query speed using Hive data partitioning. You will get hands-on experience in partitioning of Hive tables manually, deploying single SQL execution in dynamic partitioning, and bucketing of data to break it into manageable chunks.

Connecting Pentaho with Hadoop Ecosystem

Deploy ETL for data analysis activities. In this project, you will challenge your working knowledge of ETL and Business Intelligence. You will configure Pentaho to work with Hadoop distribution as well as load, transform, and extract data into the Hadoop cluster.

Multi-node Cluster Setup

Set up a Hadoop real-time cluster on Amazon EC2. The project will involve installing and configuring Hadoop. You will need to run a Hadoop multi-node using a 4-node cluster on Amazon EC2 and deploy a MapReduce job on the Hadoop cluster. Java will need to be installed as a prerequisite for running Hadoop.

Hadoop Testing Using MRUnit

In this project, you will be required to test MapReduce applications. You will write JUnit tests using MRUnit for MapReduce applications. You will also be doing mock static methods using PowerMock and Mockito and implementing MapReduce Driver for testing the map and reduce pair

Hadoop Web Log Analytics

Derive insights from web log data. The project involves the aggregation of log data, implementation of Apache Flume for data transportation, and processing of data and generating analytics. You will learn to use workflow and data cleansing using MapReduce, Pig, or Spark.

Hadoop Maintenance

Through this project, you will learn how to administer a Hadoop cluster for maintaining and managing it. You will be working with the name node directory structure, audit logging, data node block scanner, balancer, Failover, fencing, DISTCP, and Hadoop file formats.

Twitter Sentiment Analysis

Find out what is the reaction of the people to the demonetization move by India by analyzing their tweets. You will have to download the tweets, load them into Pig storage, divide the tweets into words to calculate sentiment, rate the words from +5 to −5 on the AFFIN dictionary, filter them and analyze sentiment.

Analyzing IPL T20 Cricket

This project will require you to analyze an entire cricket match and get any details of the match. You will need to load the IPL dataset into HDFS. You will then analyze that data using Apache Pig or Hive. Based on the user queries, the system will have to give the right output.

Movie Recommendation

Recommend the most appropriate movie to a user based on his taste. This is a hands-on Apache Spark project, which will include the creation of collaborative filtering, regression, clustering, and dimensionality reduction. You will need to make use of the Apache Spark MLlib component and statistical analysis.

Twitter API Integration for Tweet Analysis

Analyze the user sentiment based on a tweet. In this Twitter analysis project, you will integrate the Twitter API and use Python or PHP for developing the essential server-side codes. You will carry out filtering, parsing, and aggregation depending on the tweet analysis requirement.

Data Exploration Using Spark SQL – Wikipedia Data Set

In this project, you will be making use of the Spark SQL tool for analyzing Wikipedia data. You will be integrating Spark SQL for batch analysis, Machine Learning, visualizing, and processing of data and ETL processes, along with real-time analysis of data.

Scala Course Content

Module 01 - Introduction to Scala

1.1 Introducing Scala
1.2 Deployment of Scala for Big Data applications and Apache Spark analytics
1.3 Scala REPL, lazy values, and control structures in Scala
1.4 Directed Acyclic Graph (DAG)
1.5 First Spark application using SBT/Eclipse
1.6 Spark Web UI
1.7 Spark in the Hadoop ecosystem.

Module 02 - Pattern Matching

2.1 The importance of Scala
2.2 The concept of REPL (Read Evaluate Print Loop)
2.3 Deep dive into Scala pattern matching
2.4 Type interface, higher-order function, currying, traits, application space and Scala for data analysis

Module 03 - Executing the Scala Code

3.1 Learning about the Scala Interpreter
3.2 Static object timer in Scala and testing string equality in Scala
3.3 Implicit classes in Scala
3.4 The concept of currying in Scala
3.5 Various classes in Scala

Module 04 - Classes Concept in Scala

4.1 Learning about the Classes concept
4.2 Understanding the constructor overloading
4.3 Various abstract classes
4.4 The hierarchy types in Scala
4.5 The concept of object equality
4.6 The val and var methods in Scala

Module 05 - Case Classes and Pattern Matching

5.1 Understanding sealed traits, wild, constructor, tuple, variable pattern, and constant pattern

Module 06 - Concepts of Traits with Example

6.1 Understanding traits in Scala
6.2 The advantages of traits
6.3 Linearization of traits
6.4 The Java equivalent
6.5 Avoiding of boilerplate code

Module 07 - Scala–Java Interoperability

7.1 Implementation of traits in Scala and Java
7.2 Handling of multiple traits extending

Module 08 - Scala Collections

8.1 Introduction to Scala collections
8.2 Classification of collections
8.3 The difference between iterator and iterable in Scala
8.4 Example of list sequence in Scala

Module 09 - Mutable Collections Vs. Immutable Collections

9.1 The two types of collections in Scala
9.2 Mutable and immutable collections
9.3 Understanding lists and arrays in Scala
9.4 The list buffer and array buffer
9.6 Queue in Scala
9.7 Double-ended queue Deque, Stacks, Sets, Maps, and Tuples in Scala

Module 10 - Use Case Bobsrockets Package

10.1 Introduction to Scala packages and imports
10.2 The selective imports
10.3 The Scala test classes
10.4 Introduction to JUnit test class
10.5 JUnit interface via JUnit 3 suite for Scala test
10.6 Packaging of Scala applications in the directory structure
10.7 Examples of Spark Split and Spark Scala

Spark Course Content

Module 11 - Introduction to Spark

11.1 Introduction to Spark
11.2 Spark overcomes the drawbacks of working on MapReduce
11.3 Understanding in-memory MapReduce
11.4 Interactive operations on MapReduce
11.5 Spark stack, fine vs. coarse-grained update, Spark stack, Spark Hadoop YARN, HDFS Revision, and YARN Revision
11.6 The overview of Spark and how it is better than Hadoop
11.7 Deploying Spark without Hadoop
11.8 Spark history server and Cloudera distribution

Module 12 - Spark Basics

12.1 Spark installation guide
12.2 Spark configuration
12.3 Memory management
12.4 Executor memory vs. driver memory
12.5 Working with Spark Shell
12.6 The concept of resilient distributed datasets (RDD)
12.7 Learning to do functional programming in Spark
12.8 The architecture of Spark

Module 13 - Working with RDDs in Spark

13.1 Spark RDD
13.2 Creating RDDs
13.3 RDD partitioning
13.4 Operations and transformation in RDD
13.5 Deep dive into Spark RDDs
13.6 The RDD general operations
13.7 Read-only partitioned collection of records
13.8 Using the concept of RDD for faster and efficient data processing
13.9 RDD action for the collect, count, collects map, save-as-text-files, and pair RDD functions

Module 14 - Aggregating Data with Pair RDDs

14.1 Understanding the concept of key-value pair in RDDs
14.2 Learning how Spark makes MapReduce operations faster
14.3 Various operations of RDD
14.4 MapReduce interactive operations
14.5 Fine and coarse-grained update
14.6 Spark stack

Module 15 - Writing and Deploying Spark Applications

15.1 Comparing the Spark applications with Spark Shell
15.2 Creating a Spark application using Scala or Java
15.3 Deploying a Spark application
15.4 Scala built application
15.5 Creation of the mutable list, set and set operations, list, tuple, and concatenating list
15.6 Creating an application using SBT
15.7 Deploying an application using Maven
15.8 The web user interface of Spark application
15.9 A real-world example of Spark
15.10 Configuring of Spark

Module 16 - Parallel Processing

16.1 Learning about Spark parallel processing
16.2 Deploying on a cluster
16.3 Introduction to Spark partitions
16.4 File-based partitioning of RDDs
16.5 Understanding of HDFS and data locality
16.6 Mastering the technique of parallel operations
16.7 Comparing repartition and coalesce
16.8 RDD actions

Module 17 - Spark RDD Persistence

17.1 The execution flow in Spark
17.2 Understanding the RDD persistence overview
17.3 Spark execution flow, and Spark terminology
17.4 Distribution shared memory vs. RDD
17.5 RDD limitations
17.6 Spark shell arguments
17.7 Distributed persistence
17.8 RDD lineage
17.9 Key-value pair for sorting implicit conversions like CountByKey, ReduceByKey, SortByKey, and AggregateByKey

Module 18 - Spark MLlib

18.1 Introduction to Machine Learning
18.2 Types of Machine Learning
18.3 Introduction to MLlib
18.4 Various ML algorithms supported by MLlib
18.5 Linear regression, logistic regression, decision tree, random forest, and K-means clustering techniques

Hands-on Exercise: 
1. Building a Recommendation Engine

Module 19 - Integrating Apache Flume and Apache Kafka

19.1 Why Kafka and what is Kafka?
19.2 Kafka architecture
19.3 Kafka workflow
19.4 Configuring Kafka cluster
19.5 Operations
19.6 Kafka monitoring tools
19.7 Integrating Apache Flume and Apache Kafka

Hands-on Exercise: 
1. Configuring Single Node Single Broker Cluster
2. Configuring Single Node Multi Broker Cluster
3. Producing and consuming messages
4. Integrating Apache Flume and Apache Kafka

Module 20 - Spark Streaming

20.1 Introduction to Spark Streaming
20.2 Features of Spark Streaming
20.3 Spark Streaming workflow
20.4 Initializing StreamingContext, discretized Streams (DStreams), input DStreams and Receivers
20.5 Transformations on DStreams, output operations on DStreams, windowed operators and why it is useful
20.6 Important windowed operators and stateful operators

Hands-on Exercise: 
1. Twitter Sentiment analysis
2. Streaming using Netcat server
3. Kafka–Spark streaming
4. Spark–Flume streaming

Module 21 - Improving Spark Performance

21.1 Introduction to various variables in Spark like shared variables and broadcast variables
21.2 Learning about accumulators
21.3 The common performance issues
21.4 Troubleshooting the performance problems

Module 22 - Spark SQL and Data Frames

22.1 Learning about Spark SQL
22.2 The context of SQL in Spark for providing structured data processing
22.3 JSON support in Spark SQL
22.4 Working with XML data
22.5 Parquet files
22.6 Creating Hive context
22.7 Writing data frame to Hive
22.8 Reading JDBC files
22.9 Understanding the data frames in Spark
22.10 Creating Data Frames
22.11 Manual inferring of schema
22.12 Working with CSV files
22.13 Reading JDBC tables
22.14 Data frame to JDBC
22.15 User-defined functions in Spark SQL
22.16 Shared variables and accumulators
22.17 Learning to query and transform data in data frames
22.18 Data frame provides the benefit of both Spark RDD and Spark SQL
22.19 Deploying Hive on Spark as the execution engine

Module 23 - Scheduling/Partitioning

23.1 Learning about the scheduling and partitioning in Spark
23.2 Hash partition
23.3 Range partition
23.4 Scheduling within and around applications
23.5 Static partitioning, dynamic sharing, and fair scheduling
23.6 Map partition with index, the Zip, and GroupByKey
23.7 Spark master high availability, standby masters with ZooKeeper, single-node recovery with the local file system and high order functions

Spark and Scala Projects

Movie Recommendation

Deploy Apache Spark for a movie recommendation system. Through this project, you will be working with Spark MLlib, collaborative filtering, clustering, regression, and dimensionality reduction. By the completion of this project, you will be proficient in working with streaming data, sampling, testing, and statistics.

Twitter API Integration for Tweet Analysis

integrate Twitter API for analyzing tweets. You can use any of the scripting languages, like PHP, Ruby, or Python, for requesting the Twitter API and get the results in JSON format. You will have to perform aggregation, filtering, and parsing as per the requirement for the tweet analysis.

Data Exploration Using Spark SQL – Wikipedia Data

This project will allow you to work with Spark SQL and combine it with ETL applications, real-time analysis of data, performing batch analysis, deploying Machine Learning, creating visualizations, and processing of graphs.

Module 01 - Introduction to Data Science with R

1.1 What is Data Science?
1.2 Significance of Data Science in today’s data-driven world, applications of Data Science, lifecycle of Data Science, and its components
1.3 Introduction to Big Data Hadoop, Machine Learning, and Deep Learning
1.4 Introduction to R programming and RStudio

Hands-on Exercise:

1. Installation of RStudio
2. Implementing simple mathematical operations and logic using R operators, loops, if statements, and switch cases

Module 02 - Data Exploration

2.1 Introduction to data exploration
2.2 Importing and exporting data to/from external sources
2.3 What are data exploratory analysis and data importing?
2.4 DataFrames, working with them, accessing individual elements, vectors, factors, operators, in-built functions, conditional and looping statements, user-defined functions, and data types

Hands-on Exercise:

1. Accessing individual elements of customer churn data
2. Modifying and extracting results from the dataset using user-defined functions in R

Module 03 - Data Manipulation

3.1 Need for data manipulation
3.2 Introduction to the dplyr package
3.3 Selecting one or more columns with select(), filtering records on the basis of a condition with filter(), adding new columns with mutate(), sampling, and counting
3.4 Combining different functions with the pipe operator and implementing SQL-like operations with sqldf

Hands-on Exercise:

1. Implementing dplyr
2. Performing various operations for manipulating data and storing it

Module 04 - Data Visualization

4.1 Introduction to visualization
4.2 Different types of graphs, the grammar of graphics, the ggplot2 package, categorical distribution with geom_bar(), numerical distribution with geom_hist(), building frequency polygons with geom_freqpoly(), and making a scatterplot with geom_pont()
4.3 Multivariate analysis with geom_boxplot
4.4 Univariate analysis with a barplot, a histogram and a density plot, and multivariate distribution
4.5 Creating barplots for categorical variables using geom_bar(), and adding themes with the theme() layer
4.6 Visualization with plotly, frequency plots with geom_freqpoly(), multivariate distribution with scatter plots and smooth lines, continuous distribution vs categorical distribution with box-plots, and sub grouping plots
4.7 Working with co-ordinates and themes to make graphs more presentable, understanding plotly and various plots, and visualization with ggvis
4.8 Geographic visualization with ggmap() and building web applications with shinyR

Hands-on Exercise:

1. Creating data visualization to understand the customer churn ratio using ggplot2 charts
2. Using plotly for importing and analyzing data
3. Visualizing tenure, monthly charges, total charges, and other individual columns using a scatter plot

Module 05 - Introduction to Statistics

5.1 Why do we need statistics?
5.2 Categories of statistics, statistical terminology, types of data, measures of central tendency, and measures of spread
5.3 Correlation and covariance, standardization and normalization, probability and the types, hypothesis testing, chi-square testing, ANOVA, normal distribution, and binary distribution

Hands-on Exercise:

1. Building a statistical analysis model that uses quantification, representations, and experimental data
2. Reviewing, analyzing, and drawing conclusions from the data

Module 06 - Machine Learning

6.1 Introduction to Machine Learning
6.2 Introduction to linear regression, predictive modeling, simple linear regression vs multiple linear regression, concepts, formulas, assumptions, and residuals in Linear Regression, and building a simple linear model
6.3 Predicting results and finding the p-value and an introduction to logistic regression
6.4 Comparing linear regression with logistics regression and bivariate logistic regression with multivariate logistic regression
6.5 Confusion matrix the accuracy of a model, understanding the fit of the model, threshold evaluation with ROCR, and using qqnorm() and qqline()
6.6 Understanding the summary results with null hypothesis, F-statistic, and
building linear models with multiple independent variables

Hands-on Exercise:

1. Modeling the relationship within data using linear predictor functions
2. Implementing linear and logistics regression in R by building a model with ‘tenure’ as the dependent variable

Module 07 - Logistic Regression

7.1 Introduction to logistic regression
7.2 Logistic regression concepts, linear vs logistic regression, and math behind logistic regression
7.3 Detailed formulas, logit function and odds, bivariate logistic regression, and Poisson regression
7.4 Building a simple binomial model and predicting the result, making a confusion matrix for evaluating the accuracy, true positive rate, false positive rate, and threshold evaluation with ROCR
7.5 Finding out the right threshold by building the ROC plot, cross validation, multivariate logistic regression, and building logistic models with multiple independent variables
7.6 Real-life applications of logistic regression

Hands-on Exercise:

1. Implementing predictive analytics by describing data
2. Explaining the relationship between one dependent binary variable and one or more binary variables
3. Using glm() to build a model, with ‘Churn’ as the dependent variable

Module 08 - Decision Trees and Random Forest

8.1 What is classification? Different classification techniques
8.2 Introduction to decision trees
8.3 Algorithm for decision tree induction and building a decision tree in R
8.4 Confusion matrix and regression trees vs classification trees
8.5 Introduction to bagging
8.6 Random forest and implementing it in R
8.7 What is Naive Bayes? Computing probabilities
8.8 Understanding the concepts of Impurity function, Entropy, Gini index, and Information gain for the right split of node
8.9 Overfitting, pruning, pre-pruning, post-pruning, and cost-complexity pruning, pruning a decision tree and predicting values, finding out the right number of trees, and evaluating performance metrics

Hands-on Exercise:

1. Implementing random forest for both regression and classification problems
2. Building a tree, pruning it using ‘churn’ as the dependent variable, and building a random forest with the right number of trees
3. Using ROCR for performance metrics

Module 09 - Unsupervised Learning

9.1 What is Clustering? Its use cases
9.2 what is k-means clustering? What is canopy clustering?
9.3 What is hierarchical clustering?
9.4 Introduction to unsupervised learning
9.5 Feature extraction, clustering algorithms, and the k-means clustering algorithm
9.6 Theoretical aspects of k-means, k-means process flow, k-means in R, implementing k-means, and finding out the right number of clusters using a scree plot
9.7 Dendograms, understanding hierarchical clustering, and implementing it in R
9.8 Explanation of Principal Component Analysis (PCA) in detail and implementing PCA in R

Hands-on Exercise:

1. Deploying unsupervised learning with R to achieve clustering and dimensionality reduction
2. K-means clustering for visualizing and interpreting results for the customer churn data

Module 10 - Association Rule Mining and Recommendation Engines

10.1 Introduction to association rule mining and MBA
10.2 Measures of association rule mining: Support, confidence, lift, and apriori algorithm, and implementing them in R
10.3 Introduction to recommendation engines
10.4 User-based collaborative filtering and item-based collaborative filtering, and implementing a recommendation engine in R
10.5 Recommendation engine use cases

Hands-on Exercise:

1. Deploying association analysis as a rule-based Machine Learning method
2. Identifying strong rules discovered in databases with measures based on interesting discoveries

Self-paced Course Content

Module 11 - Introduction to Artificial Intelligence

11.1 Introducing Artificial Intelligence and Deep Learning
11.2 What is an artificial neural network? TensorFlow: The computational framework for building AI models
11.3 Fundamentals of building ANN using TensorFlow and working with TensorFlow in R

Module 12 - Time Series Analysis

12.1 What is a time series? The techniques, applications, and components of time series
12.2 Moving average, smoothing techniques, and exponential smoothing
12.3 Univariate time series models and multivariate time series analysis
12.4 ARIMA model
12.5 Time series in R, sentiment analysis in R (Twitter sentiment analysis), and text analysis

Hands-on Exercise:

1. Analyzing time series data
2. Analyzing the sequence of measurements that follow a non-random order to identify the nature of phenomenon and forecast the future values in the series

Module 13 - Support Vector Machine (SVM)

13.1 Introduction to Support Vector Machine (SVM)
13.2 Data classification using SVM
13.3 SVM algorithms using separable and inseparable cases
13.4 Linear SVM for identifying margin hyperplane

Module 14 - Naïve Bayes

14.1 What is the Bayes theorem?
14.2 What is Naïve Bayes Classifier?
14.3 Classification Workflow
14.4 How Naive Bayes classifier works and classifier building in Scikit-Learn
14.5 Building a probabilistic classification model using Naïve Bayes and the zero probability problem

Module 15 - Text Mining

15.1 Introduction to the concepts of text mining
15.2 Text mining use cases and understanding and manipulating the text with ‘tm’ and ‘stringR’
15.3 Text mining algorithms and the quantification of the text
15.4 TF-IDF and after TF-IDF

Case Studies

Case Study 01: Market Basket Analysis (MBA)

1.1 This case study is associated with the modeling technique of Market Basket Analysis, where you will learn about loading data, plotting items, and running algorithms.
1.2 It includes finding out the items that go hand in hand and can be clubbed together.
1.3 This is used for various real-world scenarios like a supermarket shopping cart and so on.

Case Study 02: Logistic Regression

2.1 In this case study, you will get a detailed understanding of the advertisement spends of a company that will help drive more sales.
2.2 You will deploy logistic regression to forecast future trends.
2.3 You will detect patterns and uncover insight using the power of R programming.
2.4 Due to this, the future advertisement spends can be decided and optimized for higher revenues.

Case Study 03: Multiple Regression

3.1 You will understand how to compare the miles per gallon (MPG) of a car based on various parameters.
3.2 You will deploy multiple regression and note down the MPG for car make, model, speed, load conditions, etc.
3.3 The case study includes model building, model diagnostic, and checking the ROC curve, among other things.

Case Study 04: Receiver Operating Characteristic (ROC)

4.1 In this case study, you will work with various datasets in R.
4.2 You will deploy data exploration methodologies.
4.3 You will also build scalable models.
4.4 Besides, you will predict the outcome with highest precision, diagnose the model that you have created with real-world data, and check the ROC curve.

Data Science Projects Covered

Market Basket Analysis

This is an inventory management project where you will find the trends in the data that will help the company to increase sales. In this project, you will be implementing association rule mining, data extraction, and data manipulation for the Market Basket Analysis.

Credit Card Fraud Detection

The project consists of data analysis for various parameters of banking dataset. You will be using a V7 predictor, V4 predictor for analysis, and data visualization for finding the probability of occurrence of fraudulent activities.

Loan Approval Prediction

In this project, you will use the banking dataset for data analysis, data cleaning, data preprocessing, and data visualization. You will implement algorithms such as Principal Component Analysis and Naive Bayes after data analysis to predict the approval rate of a loan using various parameters.

Netflix Recommendation System

Implement exploratory data analysis, data manipulation, and visualization to understand and find the trends in the Netflix dataset. You will use various Machine Learning algorithms such as association rule mining, classification algorithms, and many more to create movie recommendation systems for viewers using Netflix dataset.

Case Study 1: Introduction to R Programming

In this project, you need to work with several operators involved in R programming including relational operators, arithmetic operators, and logical operators for various organizational needs.

Case Study 2: Solving Customer Churn Using Data Exploration

Use data exploration in order to understand what needs to be done to make reductions in customer churn. In this project, you will be required to extract individual columns, use loops to work on repetitive operations, and create and implement filters for data manipulation.

Case Study 3: Creating Data Structures in R

Implement numerous data structures for numerous possible scenarios. This project requires you to create and use vectors. Further, you need to build and use metrics, utilize arrays for storing those metrics, and have knowledge of lists.

Case Study 4: Implementing SVD in R

Utilize the dataset of MovieLens to analyze and understand single value decomposition and its use in R programming. Further, in this project, you must build custom recommended movie sets for all users, develop a collaborative filtering model based on the users, and for a movie recommendation, you must create realRatingMatrix.

Case Study 5: Time Series Analysis

This project required you to perform TSA and understand ARIMA and its concepts with respect to a given scenario. Here, you will use the R programming language, ARIMA model, time series analysis, and data visualization. So, you must understand how to build an ARIMA model and fit it, find optimal parameters by plotting PACF charts, and perform various analyses to predict values.

Module 01 - Introduction to Data Science using Python

1.1 What is Data Science, what does a data scientist do
1.2 Various examples of Data Science in the industries
1.3 How Python is deployed for Data Science applications
1.4 Various steps in Data Science process like data wrangling, data exploration and selecting the model.
1.5 Introduction to Python programming language
1.6 Important Python features, how is Python different from other programming languages
1.7 Python installation, Anaconda Python distribution for Windows, Linux and Mac
1.8 How to run a sample Python script, Python IDE working mechanism
1.9 Running some Python basic commands
1.10 Python variables, data types and keywords.

Hands-on Exercise – Installing Python Anaconda for the Windows, Linux and Mac

Module 02 - Python basic constructs

2.1 Introduction to a basic construct in Python
2.2 Understanding indentation like tabs and spaces
2.3 Python built-in data types
2.4 Basic operators in Python
2.5 Loop and control statements like break, if, for, continue, else, range() and more.

Hands-on Exercise –
1.Write your first Python program
2. Write a Python function (with and without parameters)
3. Use Lambda expression
4. Write a class
5. Create a member function and a variable
6. Create an object and write a for loop to print all odd numbers

Module 03 - Maths for DS-Statistics & Probability

3.1 Central Tendency
3.2 Variabiltiy
3.3 Hypothesis Testing
3.4 Anova
3.5 Correlation
3.6 Regression
3.7 Probability Definitions and Notation
3.8 Joint Probabilities
3.9 The Sum Rule, Conditional Probability, and the Product Rule
3.10 Baye’s Theorem

Hands-on Exercise –
1. We will analyze both categorical data and quantitative data
2. Focusing on specific case studies to help solidify the week’s statistical concepts

Module 04 - OOPs in Python

4.1 Understanding the OOP paradigm like encapsulation, inheritance, polymorphism and abstraction
4.2 What are access modifiers, instances, class members
4.3 Classes and objects
4.4 Function parameter and return type functions
4.5 Lambda expressions.

Hands-on Exercise –
1. Writing a Python program and incorporating the OOP concepts

Module 05 - NumPy for mathematical computing

5.1 Introduction to mathematical computing in Python
5.2 What are arrays and matrices, array indexing, array math, Inspecting a numpy array, Numpy array manipulation

Hands-on Exercise –
1. How to import numpy module
2. Creating array using ND-array
3. Calculating standard deviation on array of numbers and calculating correlation between two variables.

Module 06 - Scipy for scientific computing

6.1 Introduction to scipy, building on top of numpy
6.2 What are the characteristics of scipy
6.3 Various subpackages for scipy like Signal, Integrate, Fftpack, Cluster, Optimize, Stats and more, Bayes Theorem with scipy.

Hands-on Exercise:
1. Importing of scipy
2. Applying the Bayes theorem on the given dataset.

Module 07 - Data manipulation

7.1 What is a data Manipulation. Using Pandas library
7.2 Numpy dependency of Pandas library
7.3 Series object in pandas
7.4 Dataframe in Pandas
7.5 Loading and handling data with Pandas
7.6 How to merge data objects
7.7 Concatenation and various types of joins on data objects, exploring dataset

Hands-on Exercise –
1. Doing data manipulation with Pandas by handling tabular datasets that includes variable types like float, integer, double and others.
2. Cleaning dataset, Manipulating dataset, Visualizing dataset

Module 08 - Data visualization with Matplotlib

8.1 Introduction to Matplotlib
8.2 Using Matplotlib for plotting graphs and charts like Scatter, Bar, Pie, Line, Histogram and more
8.3 Matplotlib API

Hands-on Exercise –
1. Deploying Matplotlib for creating pie, scatter, line and histogram.
2. Subplots and Pandas built-in data visualization.

Module 09 - Machine Learning using Python

9.1 Revision of topics in Python (Pandas, Matplotlib, numpy, scikit-Learn)
9.2 Introduction to machine learning
9.3 Need of Machine learning
9.4 Types of machine learning and workflow of Machine Learning
9.5 Uses Cases in Machine Learning, its various arlogrithms
9.6 What is supervised learning
9.7 What is Unsupervised Learning

Hands-on Exercise –
1. Demo on ML algorithms

Module 10 - Supervised learning

10.1 What is linear regression
10.2 Step by step calculation of Linear Regression
10.3 Linear regression in Python
10.4 Logistic Regression
10.5 What is classification
10.6 Decision Tree, Confusion Matrix, Random Forest, Naïve Bayes classifier (Self paced), Support Vector Machine(self paced), xgboost(self paced)

Hands-on Exercise – Using Python library Scikit-Learn for coming up with Random Forest algorithm to implement supervised learning.

Module 11 - Unsupervised Learning

11.1 Introduction to unsupervised learning
11.2 Use cases of unsupervised learning
11.3 What is clustering
11.4 Types of clustering(self-paced)-Exclusive clustering, Overlapping Clustering, Hierarchical Clustering(self-paced)
11.5 What is K-means clustering
11.6 Step by step calculation of k-means algorithm
11.7 Association Rule Mining(self-paced), Market Basket Analysis(self-paced), Measures in association rule mining(self-paced)-support, confidence, lift
11.8 Apriori Algorithm

Hands-on Exercise –
1. Setting up the Jupyter notebook environment
2. Loading of a dataset in Jupyter
3. Algorithms in Scikit-Learn package for performing Machine Learning techniques and training a model to search a grid.
4. Practice on k-means using Scikit
5. Practice on Apriori

Module 12 - Python integration with Spark-(selfpaced)

12.1 Introduction to pyspark
12.2 Who uses pyspark, need of spark with python
12.3 Pyspark installation
12.4 Pyspark fundamentals
12.5 Advantage over mapreduce, pyspark
12.6 Use-cases pyspark  and demo.

Hands-on Exercise:
1. Demonstrating Loops and Conditional Statements
2. Tuple – related operations, properties, list, etc.
3. List – operations, related properties
4. Set – properties, associated operations, dictionary – operations, related properties.

Module 13 - Dimensionality Reduction

13.1 Introduction to Dimensionality
13.2 Why Dimensionality Reduction
13.3 PCA
13.4 Factor Analysis
13.5 LDA

Hands-on Exercise –
Practice Dimensionality reduction Techniques : PCA, Factor Analysis, t-SNE, Random Forest, Forward and Backward feature

Module 14 - Time Series Forecasting

14.1 White Noise
14.2 AR model
14.3 MA model
14.4 ARMA model
14.5 ARIMA model
14.6 Stationarity
14.7 ACF & PACF

Hands-on Exercise –
1. Create AR model
2. Create MA model
3. Create ARMA model

Data Science with Python Projects

Analyzing the Trends of COVID-19 With Python

In this project, you will use Pandas to accumulate data from multiple data files, Plotly (visualization library) to create interactive visualizations, and Facebook’s Prophet library to make time series models. You will also be visualizing the prediction by combining these technologies.

Analyzing the Naming Trends Using Python

In this project, you will use Python Programming and Algorithms to understand the applications of data manipulation, extract files with useful data only, and concepts of data visualization. You will be required to analyze baby names by sorting out the top 100 birth counts.

Performing Analysis on Customer Churn Dataset

Through this project, you will be analyzing employment reliability in the telecom industry. The project will require you to work on real-time analysis of data with multiple labels, data visualization for reliability factor, visual analysis of various columns to verify, and plotting charts to substantiate the findings in total.

Netflix-Recommendation System

Analysis of movies dataset and recommendation of movies with respect to ratings. You will be working with the combined data of movies and their ratings, performing data analysis on various labels in the data, finding the distribution of different ratings in the dataset, and training the SVD for the prediction of the model.

Python Web Scraping for Data Science

In this project, you will learn web scraping using Python. You will work on Beautiful Soup, web scraping libraries, common data and page format on the web, the important kinds of objects, Navigable String, the searching tree deployment, navigation options, parser, search tree, searching by CSS class, list, function, and keyword argument.

OOPS in Python

Creating multiple methods using OOPS. You will work on methods like “check_balance’ to check the remaining balance in an account, “withdraw” to withdraw from the bank, find the distribution of different ratings in the dataset, and override the “withdraw” to ensure that the minimum balance is maintained. You will also work with Parameterization and Classes.

Working With NumPy

In this case study, you will be working with the NumPy library to solve various problems in Python. You will create 2D arrays, initialize a NumPy array of 5*5 dimensions, and perform simple arithmetic operations on the two arrays. To carry out this case study successfully, you will have to be familiar with NumPy.

Visualizing and Analyzing the Customer Churn dataset using Python

This case study will require you to analyze data by building aesthetic graphs to make better sense of the data. You will be working with the ggplot2 package, bar plots and its applications, histogram graphs for data analysis, and box-plots and outliers in them.

Building Models With the Help of Machine Learning Algorithms

You will be designing tree-based models on the ‘Heart’ dataset, performing real-time data manipulation on the heart dataset, data-visualization for multiple columnar data, building a tree-based model on top of the database, and designing a probabilistic classification model on the database. You will have to be familiar with ML Algorithms.

Module 1 - Introduction to Data Visualization and Power of Tableau

1.1 What is data visualization?
1.2 Comparison and benefits against reading raw numbers
1.3 Real use cases from various business domains
1.4 Some quick and powerful examples using Tableau without going into the technical details of Tableau
1.5 Installing Tableau
1.6 Tableau interface
1.7 Connecting to DataSource
1.8 Tableau data types
1.9 Data preparation

Module 2 - Architecture of Tableau

2.1 Installation of Tableau Desktop
2.2 Architecture of Tableau
2.3 Interface of Tableau (Layout, Toolbars, Data Pane, Analytics Pane, etc.)
2.4 How to start with Tableau
2.5 The ways to share and export the work done in Tableau

Hands-on Exercise:

1. Play with Tableau desktop
2. Learn about the interface
3. Share and export existing works

Module 3 - Working with Metadata and Data Blending

3.1 Connection to Excel
3.2 Cubes and PDFs
3.3 Management of metadata and extracts
3.4 Data preparation
3.5 Joins (Left, Right, Inner, and Outer) and Union
3.6 Dealing with NULL values, cross-database joining, data extraction, data blending, refresh extraction, incremental extraction, how to build extract, etc.

Hands-on Exercise:

1. Connect to Excel sheet to import data
2. Use metadata and extracts
3. Manage NULL values
4. Clean up data before using
5. Perform the join techniques
6. Execute data blending from multiple sources

Module 4 - Creation of Sets

4.1 Mark, highlight, sort, group, and use sets (creating and editing sets, IN/OUT, sets in hierarchies)
4.2 Constant sets
4.3 Computed sets, bins, etc.

Hands-on Exercise:

1. Use marks to create and edit sets
2. Highlight the desired items
3. Make groups
4. Apply sorting on results
5. Make hierarchies among the created sets

Module 5 - Working with Filters

5.1 Filters (addition and removal)
5.2 Filtering continuous dates, dimensions, and measures
5.3 Interactive filters, marks card, and hierarchies
5.4 How to create folders in Tableau
5.5 Sorting in Tableau
5.6 Types of sorting
5.7 Filtering in Tableau
5.8 Types of filters
5.9 Filtering the order of operations

Hands-on Exercise:

1. Use the data set by date/dimensions/measures to add a filter
2. Use interactive filter to view the data
3. Customize/remove filters to view the result

Module 6 - Organizing Data and Visual Analytics

6.1 Using Formatting Pane to work with menu, fonts, alignments, settings, and copy-paste
6.2 Formatting data using labels and tooltips
6.3 Edit axes and annotations
6.4 K-means cluster analysis
6.5 Trend and reference lines
6.6 Visual analytics in Tableau
6.7 Forecasting, confidence interval, reference lines, and bands

Hands-on Exercise:

1. Apply labels and tooltips to graphs, annotations, edit axes’ attributes
2. Set the reference line
3. Perform k-means cluster analysis on the given dataset

Module 7 - Working with Mapping

7.1 Working on coordinate points
7.2 Plotting longitude and latitude
7.3 Editing unrecognized locations
7.4 Customizing geocoding, polygon maps, WMS: web mapping services
7.5 Working on the background image, including add image
7.6 Plotting points on images and generating coordinates from them
7.7 Map visualization, custom territories, map box, WMS map
7.8 How to create map projects in Tableau
7.9 Creating dual axes maps, and editing locations

Hands-on Exercise:

1. Plot longitude and latitude on a geo map
2. Edit locations on the geo map
3. Custom geocoding
4. Use images of the map and plot points
5. Find coordinates
6. Create a polygon map
7. Use WMS

Module 8 - Working with Calculations and Expressions

8.1 Calculation syntax and functions in Tableau
8.2 Various types of calculations, including Table, String, Date, Aggregate, Logic, and Number
8.3 LOD expressions, including concept and syntax
8.4 Aggregation and replication with LOD expressions
8.5 Nested LOD expressions
8.6 Levels of details: fixed level, lower level, and higher level
8.7 Quick table calculations
8.8 The creation of calculated fields
8.9 Predefined calculations
8.10 How to validate

Module 9 - Working with Parameters

9.1 Creating parameters
9.2 Parameters in calculations
9.3 Using parameters with filters
9.4 Column selection parameters
9.5 Chart selection parameters
9.6 How to use parameters in the filter session
9.7 How to use parameters in calculated fields
9.8 How to use parameters in the reference line

Hands-on Exercise:

1. Creating new parameters to apply on a filter
2. Passing parameters to filters to select columns
3. Passing parameters to filters to select charts

Module 10 - Charts and Graphs

10.1 Dual axes graphs
10.2 Histograms
10.3 Single and dual axes
10.4 Box plot
10.5 Charts: motion, Pareto, funnel, pie, bar, line, bubble, bullet, scatter, and waterfall charts
10.6 Maps: tree and heat maps
10.7 Market basket analysis (MBA)
10.8 Using Show me
10.9 Text table and highlighted table

Hands-on Exercise:

1. Plot a histogram, tree map, heat map, funnel chart, and more using the given dataset
2. Perform market basket analysis (MBA) on the same dataset

Module 11 - Dashboards and Stories

11.1 Building and formatting a dashboard using size, objects, views, filters, and legends
11.2 Best practices for making creative as well as interactive dashboards using the actions
11.3 Creating stories, including the intro of story points
11.4 Creating as well as updating the story points
11.5 Adding catchy visuals in stories
11.6 Adding annotations with descriptions; dashboards and stories
11.7 What is dashboard?
11.8 Highlight actions, URL actions, and filter actions
11.9 Selecting and clearing values
11.10 Best practices to create dashboards
11.11 Dashboard examples; using Tableau workspace and Tableau interface
11.12 Learning about Tableau joins
11.13 Types of joins
11.14 Tableau field types
11.15 Saving as well as publishing data source
11.16 Live vs extract connection
11.17 Various file types

Hands-on Exercise:

1. Create a Tableau dashboard view, include legends, objects, and filters
2. Make the dashboard interactive
3. Use visual effects, annotations, and descriptions to create and edit a story

Module 12 - Tableau Prep

12.1 Introduction to Tableau Prep
12.2 How Tableau Prep helps quickly combine join, shape, and clean data for analysis
12.3 Creation of smart examples with Tableau Prep
12.4 Getting deeper insights into the data with great visual experience
12.5 Making data preparation simpler and accessible
12.6 Integrating Tableau Prep with Tableau analytical workflow
12.7 Understanding the seamless process from data preparation to analysis with Tableau Prep

Module 13 - Integration of Tableau with R and Hadoop

13.1 Introduction to R language
13.2 Applications and use cases of R
13.3 Deploying R on the Tableau platform
13.4 Learning R functions in Tableau
13.5 The integration of Tableau with Hadoop

Hands-on Exercise:

1. Deploy R on Tableau
2. Create a line graph using R interface
3. Connect Tableau with Hadoop to extract data

Tableau Projects Covered

Understanding the global covid-19 mortality rates

Analyze and develop a dashboard to understand the covid-19 global cases.Compare the global confirmed vs. death cases in a world map. Compare the country wise cases using logarithmic axes. Dashboard should display both a log axis chart and a default axis chart in an alternate interactive way. Create a parameter to dynamically view Top N WHO regions based on cumulative new cases and death cases ratio. Dashboard should have a drop down menu to view the WHO region wise data using a bar chart, line chart or a map as per user’s requirement.

Understand the UK bank customer data

Analyze and develop a dashboard to understand the customer data of a UK bank. Create an asymmetric drop down of Region with their respective customer names and their Balances with a gender wise color code. Region wise bar chart which displays the count of customers based on High and low balance. Create a parameter to let the users’ dynamically decide the limit value of balance which categorizes it into high and low. Include interactive filters for Job classifications and Highlighters for Region in the final dashboard.

Understand Financial Data

Create an interactive map to analyze the worldwide sales and profit. Include map layers and map styles to enhance the visualization. Interactive analysis to display the average gross sales of a product under each segment, allowing only one segment data to be displayed at once. Create a motion chart to compare the sales and profit through the years. Annotate the day wise profit line chart to indicate the peaks and also enable drop lines. Add go to URL actions in the final dashboard which directs the user to the respective countries Wikipedia page.

Understand Agriculture Data

Create interactive tree map to display district wise data. Tree maps should have state labels. On hovering on a particular state, the corresponding districts data are to be displayed. Add URL actions, which direct users’ to a Google search page of the selected crop. Web page is to be displayed on the final dashboard. Create a hierarchy of seasons, crop categories and the list of crops under each. Add highlighters for season. One major sheet in the final dashboard should be unaffected by any action applied. Use the view in this major sheet to filter data in the other. Using parameters color code the seasons with high yield and low yield based on its crop categories. Rank the crops based on their yield

Module 1 - Splunk Development Concepts

1.1 Introduction to Splunk and Splunk developer roles and responsibilities

Module 2 - Basic Searching

2.1 Writing Splunk query for search
2.2 Auto-complete to build a search
2.3 Time range
2.4 Refine search
2.5 Working with events
2.6 Identifying the contents of search
2.7 Controlling a search job

Hands-on Exercise –
Write a basic search query

Module 3 - Using Fields in Searches

3.1 What is a Field
3.2 How to use Fields in search
3.3 Deploying Fields Sidebar and Field Extractor for REGEX field extraction
3.4 Delimiting Field Extraction using FX

Hands-on Exercise –

  1. Use Fields in Search
  2. Use Fields Sidebar
  3. Use Field Extractor (FX)
  4. Delimit field Extraction using FX

Module 4 - Saving and Scheduling Searches

4.1 Writing Splunk query for search, sharing, saving, scheduling and exporting search results

Hands-on Exercise –

  1. Schedule a search
  2. Save a search result
  3. Share and export a search result

Module 5: Creating Alerts

5.1 How to create alerts
5.2 Understanding alerts
5.3 Viewing fired alerts

Hands-on Exercise –

  1. Create an alert in Splunk
  2. View the fired alerts

Module 6 - Scheduled Reports

6.1 Describe and configure scheduled reports

Module 7 - Tags and Event Types

7.1 Introduction to Tags in Splunk
7.2 Deploying Tags for Splunk search
7.3 Understanding event types and utility
7.4 Generating and implementing event types in search

Hands-on Exercise –

  1. Deploy tags for Splunk search
  2. Generate and implement event types in search

Module 8 - Creating and Using Macros

8.1 What is a Macro
8.2 What are variables and arguments in Macros

Hands-on Exercise –

  1. First, you define a Macro with arguments and then use variables with in it

Module 9 - Workflow

9.1 Creating get, post and search workflow actions

Hands-on Exercise –

  1. Create get, post and search workflow actions

Module 10 - Splunk Search Commands

10.1 Studying the search command
10.2 The general search practices
10.3 What is a search pipeline
10.4 How to specify indexes in search
10.5 Highlighting the syntax
10.6 Deploying the various search commands like fields, tables, sort, rename, rex and erex

Hands-on Exercise –

  1. Steps to create a search pipeline
  2. Search index specification
  3. How to highlight syntax
  4. Using the auto complete feature
  5. Deploying the various search commands like sort, fields, tables, rename, rex and erex

Module 11 - Transforming Commands

11.1 Using top, rare and stats commands

Hands-on Exercise –

  1. Use top, rare and stats commands

Module 12 - Reporting Commands

12.1 Using following commands and their functions: addcoltotals, addtotals, top, rare and stats

Hands-on Exercise –

  1. Create reports using following commands and their functions: addcoltotals and addtotals

Module 13 - Mapping and Single Value Commands

13.1 iplocation, geostats, geom and addtotals commands

Hands-on Exercise –

  1. Track IP using iplocation and get geo data using geostats

Module 14 - Splunk Reports and Visualizations

14.1 Explore the available visualizations
14.2 Create charts and time charts
14.3 Omit null values and format results

Hands-on Exercise –

  1. Create time charts
  2. Omit null values
  3. Format results

Module 15 - Analyzing, Calculating and Formatting Results

15.1 Calculating and analyzing results
15.2 Value conversion
15.3 Roundoff and format values
15.4 Using the eval command
15.5 Conditional statements
15.6 Filtering calculated search results

Hands-on Exercise –

  1. Calculate and analyze results
  2. Perform conversion on a data value
  3. Roundoff numbers
  4. Use the eval command
  5. Write conditional statements
  6. Apply filters on calculated search results

Module 16 - Correlating Events

16.1 How to search the transactions
16.2 Creating report on transactions
16.3 Grouping events using time and fields
16.4 Comparing transactions with stats

Hands-on Exercise –

  1. Generate report on transactions
  2. Group events using fields and time

Module 17 - Enriching Data with Lookups

17.1 Learning data lookups
17.2 Examples and lookup tables
17.3 Defining and configuring automatic lookups
17.4 Deploying lookups in reports and searches

Hands-on Exercise –

  1. Define and configure automatic lookups
  2. Deploy lookups in reports and searches

Module 18 - Creating Reports and Dashboards

18.1 Creating search charts, reports and dashboards
18.2 Editing reports and dashboards
18.3 Adding reports to dashboards

Hands-on Exercise –

  1. Create search charts, reports and dashboards
  2. Edit reports and dashboards
  3. Add reports to dashboards

Module 19 - Getting Started with Parsing

19.1 Working with raw data for data extraction, transformation, parsing and preview

Hands-on Exercise –

  1. Extract useful data from raw data
  2. Perform transformation
  3. Parse different values and preview

Module 20 - Using Pivot

20.1 Describe pivot
20.2 Relationship between data model and pivot
20.3 Select a data model object
20.4 Create a pivot report
20.5 Create instant pivot from a search
20.6 Add a pivot report to dashboard

Hands-on Exercise –

  1. Select a data model object
  2. Create a pivot report
  3. Create instant pivot from a search
  4. Add a pivot report to dashboard

Module 21 - Common Information Model (CIM) Add-On

21.1 What is a Splunk CIM
21.2 Using the CIM Add-On to normalize data

Hands-on Exercise –

  1. Use the CIM Add-On to normalize data

Splunk Administration Topics

Module 22 - Overview of Splunk

22.1 Introduction to the architecture of Splunk
22.2 Various server settings
22.3 How to set up alerts
22.4 Various types of licenses
22.5 Important features of Splunk tool
22.6 The requirements of hardware and conditions needed for installation of Splunk

Module 23 - Splunk Installation

23.1 How to install and configure Splunk
23.2 The creation of index
23.3 Standalone server’s input configuration
23.4 The preferences for search
23.5 Linux environment Splunk installation
23.6 The administering and architecting of Splunk

Module 24 - Splunk Installation in Linux

24.1 How to install Splunk in the Linux environment
24.2 The conditions needed for Splunk
24.3 Configuring Splunk in the Linux environment

Module 25 - Distributed Management Console

25.1 Introducing Splunk distributed management console
25.2 Indexing of clusters
25.3 How to deploy distributed search in Splunk environment
25.4 Forwarder management
25.5 User authentication and access control

Module 26 - Introduction to Splunk App

26.1 Introduction to the Splunk app
26.2 How to develop Splunk apps
26.3 Splunk app management
26.4 Splunk app add-ons
26.5 Using Splunk-base for installation and deletion of apps
26.6 Different app permissions and implementation
26.7 How to use the Splunk app
26.8 Apps on forwarder

Module 27 - Splunk Indexes and Users

27.1 Details of the index time configuration file
27.2 The search time configuration file

Module 28 - Splunk Configuration Files

28.1 Understanding of Index time and search time configuration filesin Splunk
28.2 Forwarder installation
28.3 Input and output configuration
28.4 Universal Forwarder management
28.5 Splunk Universal Forwarder highlights

Module 29 - Splunk Deployment Management

29.1 Implementing the Splunk tool
29.2 Deploying it on the server
29.3 Splunk environment setup
29.4 Splunk client group deployment

Module 30 - Splunk Indexes

30.1 Understanding the Splunk Indexes
30.2 The default Splunk Indexes
30.3 Segregating the Splunk Indexes
30.4 Learning Splunk Buckets and Bucket Classification
30.5 Estimating Index storage
30.6 Creating new Index

Module 31 - User Roles and Authentication

31.1 Understanding the concept of role inheritance
31.2 Splunk authentications
31.3 Native authentications
31.4 LDAP authentications

Module 32 - Splunk Administration Environment

32.1 Splunk installation, configuration
32.2 Data inputs
32.3 App management
32.4 Splunk important concepts
32.5 Parsing machine-generated data
32.6 Search indexer and forwarder

Module 33 - Basic Production Environment

33.1 Introduction to Splunk Configuration Files
33.2 Universal Forwarder
33.3 Forwarder Management
33.4 Data management, troubleshooting and monitoring

Module 34 - Splunk Search Engine

34.1 Converting machine-generated data into operational intelligence
34.2 Setting up the dashboard, reports and charts
34.3 Integrating Search Head Clustering and Indexer Clustering

Module 35 - Various Splunk Input Methods

35.1 Understanding the input methods
35.2 Deploying scripted, Windows and network
35.3 Agentless input types and fine-tuning them all

Module 36 - Splunk User and Index Management

36.1 Splunk user authentication and job role assignment
36.2 Learning to manage, monitor and optimize Splunk Indexes

Module 37 - Machine Data Parsing

37.1 Understanding parsing of machine-generated data
37.2 Manipulation of raw data
37.3 Previewing and parsing
37.4 Data field extraction
37.5 Comparing single-line and multi-line events

Module 38 - Search Scaling and Monitoring

38.1 Distributed search concepts
38.2 Improving search performance
38.3 Large-scale deployment and overcoming execution hurdles
38.4 Working with Splunk Distributed Management Console for monitoring the entire operation

Module 39 - Splunk Cluster Implementation

39.1 Cluster indexing
39.2 Configuring individual nodes
39.3 Configuring the cluster behavior, index and search behavior
39.4 Setting node type to handle different aspects of cluster like master node, peer node and search head

What projects I will be working on this Splunk Developer and Admin training?

Project 1 : Creating an Employee Database of a Company

Industry : General

Problem Statement : How to build a Splunk dashboard where employee details are readily available

Topics : In this project, you will create a text file of employee data with details like full name, salary, designation, ID and so on. You will index the data based on various parameters, use various Splunk commands for evaluating and extracting the information. Finally, you will create a dashboard and add various reports to it.

Highlights :

  • Splunk search and index commands
  • Extracting field in search and saving results
  • Editing event types and adding tags

Project 2 : Building an Organizational Dashboard with Splunk

Industry :  E-commerce

Problem Statement : How to analyze website traffic and gather insights

Topics :  In this project, you will build an analytics dashboard for a website and create alerts for various conditions. You will capture access logs of the web server andthe sample logs and then the sample are uploaded. You will analyze the top ten users, the average time spent, peak response time of the website, the top ten errors and error code description. You will also create a Splunk dashboard for reporting and analyzing.

Highlights :

  • Creating bar and line charts
  • Sending alerts for various conditions
  • Providing admin rights for dashboard

Project 3 : Field Extraction in Splunk

Industry : General

Problem Statement :How to extract the fields from event data in Splunk

Topics : In this project, you will learn to extract fields from events using the Splunk field extraction technique. You will gain knowledge in the basics of field extractions, understand the use of the field extractor, the field extraction page in Splunk web and field extract configuration in files. You will learn the regular expression and delimiters method of field extraction. Upon the completion of the project, you will gain expertise in building Splunk dashboard and use the extracted fields data in it to create rich visualizations in an enterprise setup.

Highlight :

  • Field extraction using delimiter method
  • Delimit field extracts using FX
  • Extracting fields with the search command

Introduction to SAS

Installation and introduction to SAS, how to get started with SAS, understanding different SAS windows, how to work with data sets, various SAS windows like output, search, editor, log and explorer and understanding the SAS functions, which are various library types and programming files

SAS Enterprise Guide

How to import and export raw data files, how to read and subset the data sets, different statements like SET, MERGE and WHERE

Hands-on Exercise: How to import the Excel file in the workspace and how to read data and export the workspace to save data

SAS Operators and Functions

Different SAS operators like logical, comparison and arithmetic, deploying different SAS functions like Character, Numeric, Is Null, Contains, Like and Input/Output, along with the conditional statements like If/Else, Do While, Do Until and so on

Hands-on Exercise: Performing operations using the SAS functions and logical and arithmetic operations

Compilation and Execution

Understanding about input buffer, PDV (backend) and learning what is Missover

Using Variables

Defining and using KEEP and DROP statements, apply these statements and formats and labels in SAS

Hands-on Exercise: Use KEEP and DROP statements

Creation and Compilation of SAS Data Sets

Understanding the delimiter, dataline rules, DLM, delimiter DSD, raw data files and execution and list input for standard data

Hands-on Exercise: Use delimiter rules on raw data files

SAS Procedures

Various SAS standard procedures built-in for popular programs: PROC SORT, PROC FREQ, PROC SUMMARY, PROC RANK, PROC EXPORT, PROC DATASET, PROC TRANSPOSE, PROC CORR, etc.

Hands-on Exercise: Use SORT, FREQ, SUMMARY, EXPORT and other procedures

Input Statement and Formatted Input

Reading standard and non-standard numeric inputs with formatted inputs, column pointer controls, controlling while a record loads, line pointer control/absolute line pointer control, single trailing, multiple IN and OUT statements, dataline statement and rules, list input method and comparing single trailing and double trailing

Hands-on Exercise:  Read standard and non-standard numeric inputs with formatted inputs, control while a record loads, control a line pointer and write multiple IN and OUT statements

SAS Format

SAS Format statements: standard and user-written, associating a format with a variable, working with SAS Format, deploying it on PROC data sets and comparing ATTRIB and Format statements

Hands-on Exercise: Format a variable, deploy format rule on PROC data set and use ATTRIB statement

SAS Graphs

Understanding PROC GCHART, various graphs, bar charts: pie, bar and 3D and plotting variables with PROC GPLOT

Hands-on Exercise: Plot graphs using PROC GPLOT and display charts using PROC GCHART

Interactive Data Processing

SAS advanced data discovery and visualization, point-and-click analytics capabilities and powerful reporting tools

Data Transformation Function

Character functions, numeric functions and converting variable type

Hands-on Exercise: Use functions in data transformation

Output Delivery System (ODS)

Introduction to ODS, data optimization and how to generate files (rtf, pdf, html and doc) using SAS

Hands-on Exercise: Optimize data and generate rtf, pdf, html and doc files

SAS Macros

Macro Syntax, macro variables, positional parameters in a macro and macro step

Hands-on Exercise: Write a macro and use positional parameters

PROC SQL

SQL statements in SAS, SELECT, CASE, JOIN and UNION and sorting data

Hands-on Exercise: Create SQL query to select and add a condition and use a CASE in select query

Advanced Base SAS

Base SAS web-based interface and ready-to-use programs, advanced data manipulation, storage and retrieval and descriptive statistics

Hands-on Exercise: Use web UI to do statistical operations

Summarization Reports

Report enhancement, global statements, user-defined formats, PROC SORT, ODS destinations, ODS listing, PROC FREQ, PROC Means, PROC UNIVARIATE, PROC REPORT and PROC PRINT

Hands-on Exercise: Use PROC SORT to sort the results, list ODS, find mean using PROC Means and print using PROC PRINT

SAS Projects

Categorization of Patients Based on the Count of Drugs for Their Therapy

This project aims to find out descriptive statistics and subset for specific clinical data problems. It will give them brief insight about Base SAS procedures and data steps.

Build Revenue Projections Reports

You will be working with the SAS data analytics and business intelligence tool. You will get to work on the data entered in a business enterprise setup and will aggregate, retrieve, and manage that data. Create insightful reports and graphs and come up with statistical and mathematical analysis to predict revenue projection.

Impact of Pre-paid Plans on the Preferences of Investors

This project aims to find the most impacting factors in the preferences of the pre-paid model. The project also identifies which variables are highly correlated with impacting factors. In addition to this, the project also looks to identify various insights that would help a newly established brand to foray deeper into the market on a large scale.

k-means cluster Analysis on Iris Dataset

In this project, you will be required to do k-means cluster analysis on an Iris dataset to predict the class of a flower using the dimensions of its petals.

Module 01 - Introduction to Deep Learning and Neural Networks

1.1 Field of machine learning, its impact on the field of artificial intelligence
1.2 The benefits of machine learning w.r.t. Traditional methodologies
1.3 Deep learning introduction and how it is different from all other machine learning methods
1.4 Classification and regression in supervised learning
1.5 Clustering and association in unsupervised learning, algorithms that are used in these categories
1.6 Introduction to ai and neural networks
1.7 Machine learning concepts
1.8 Supervised learning with neural networks
1.9 Fundamentals of statistics, hypothesis testing, probability distributions, and hidden markov models.

Module 02 - Multi-layered Neural Networks

2.1 Multi-layer network introduction, regularization, deep neural networks
2.2 Multi-layer perceptron
2.3 Overfitting and capacity
2.4 Neural network hyperparameters, logic gates
2.5 Different activation functions used in neural networks, including relu, softmax, sigmoid and hyperbolic functions
2.6 Back propagation, forward propagation, convergence, hyperparameters, and overfitting.

Module 03 - Artificial Neural Networks and Various Methods

3.1 Various methods that are used to train artificial neural networks
3.2 Perceptron learning rule, gradient descent rule, tuning the learning rate, regularization techniques, optimization techniques
3.3 Stochastic process, vanishing gradients, transfer learning, regression techniques,
3.4 Lasso l1 and ridge l2, unsupervised pre-training, xavier initialization.

Module 04 - Deep Learning Libraries

4.1 Understanding how deep learning works
4.2 Activation functions, illustrating perceptron, perceptron training
4.3 multi-layer perceptron, key parameters of perceptron;
4.4 Tensorflow introduction and its open-source software library that is used to design, create and train
4.5 Deep learning models followed by google’s tensor processing unit (tpu) programmable ai
4.6 Python libraries in tensorflow, code basics, variables, constants, placeholders
4.7 Graph visualization, use-case implementation, keras, and more.

Module 05 - Keras API

5.1 Keras high-level neural network for working on top of tensorflow
5.2 Defining complex multi-output models
5.3 Composing models using keras
5.3 Sequential and functional composition, batch normalization
5.4 Deploying keras with tensorboard, and neural network training process customization.

Module 06 - TFLearn API for TensorFlow

6.1 Using tflearn api to implement neural networks
6.2 Defining and composing models, and deploying tensorboard

Module 07 - Dnns (deep neural networks)

7.1 Mapping the human mind with deep neural networks (dnns)
7.2 Several building blocks of artificial neural networks (anns)
7.3 The architecture of dnn and its building blocks
7.4 Reinforcement learning in dnn concepts, various parameters, layers, and optimization algorithms in dnn, and activation functions.

Module 08 - Cnns (convolutional neural networks)

8.1 What is a convolutional neural network?
8.2 Understanding the architecture and use-cases of cnn
8.3‘What is a pooling layer?’ how to visualize using cnn
8.4 How to fine-tune a convolutional neural network
8.5 What is transfer learning?
8.6 Understanding recurrent neural networks, kernel filter, feature maps, and pooling, and deploying convolutional neural networks in tensorflow.

Module 09 - Rnns (recurrent neural networks)

9.1 Introduction to the rnn model
9.2 Use cases of rnn, modeling sequences
9.3 Rnns with back propagation
9.4 Long short-term memory (lstm)
9.5 Recursive neural tensor network theory, the basic rnn cell, unfolded rnn,  dynamic rnn
9.6 Time-series predictions.

Module 10 - Gpu in deep learning

10.1 Gpu’s introduction, ‘how are they different from cpus?,’ the significance of gpus
10.2 Deep learning networks, forward pass and backward pass training techniques
10.3 Gpu constituent with simpler core and concurrent hardware.

Module 11- Autoencoders and restricted boltzmann machine (rbm)

11.1 Introduction  rbm and autoencoders
11.2 Deploying rbm for deep neural networks, using rbm for collaborative filtering
11.3 Autoencoders features and applications of autoencoders.

Module 12 - Deep learning applications

12.1 Image processing
12.2 Natural language processing (nlp) – Speech recognition, and video analytics.

Module 13 - Chatbots

13.1 Automated conversation bots leveraging any of the following descriptive techniques:  Ibm watson, Microsoft’s luis, Open–closed domain bots,
13.2 Generative model, and the sequence to sequence model (lstm).

Artificial Intelligence Assignments and Projects

Auto-Encoder Assignment

As part of this assignment, you have to implement an LSTM encoder. Create an input sequence of numbers. Build an LSTM RNN model on top of this data. Compile the model with ‘adam’ to be the optimizer and loss to be ‘mse’. Fit the model on data and set the number of epochs to be 300. Predict the values and verify it with the input data.

CNN Assignment

In this assignment, you have to build your convolutional Neural Network using MNIST dataset. For this, you will have to download the MNIST dataset through Keras. You will be asked to fit the dataset to a model and evaluate the loss and accuracy of the model. You will be working with pooling layers, dense layers, dropout layers, flatten layers, and NumPy.

Binary Classification on ‘Customer_Churn’ Using Keras

In this project, you will have to analyze the data of a telecom company to find insights and stop customers from churning out to other telecom companies. You will be working on data manipulation and visualization, and create 3 different models with the help of Keras.

Face Detection Project

For the project, you will be using Python 3.5(64-bit) with OpenCV for face detection. The system will have to be able to detect multiple faces in a single image. You will be working with essential libraries like cv2 and glob (glob helps in finding all the pathnames matching a specified pattern).

Keras Assignment

Build a sequential model using Keras on top of this Diabetes dataset to find out if a patient has diabetes or not. You will use Stochastic Gradient as the optimization algorithm. You will be required to build another sequential model where ‘Outcome’ is the dependent variable and all other columns are predictors.

MLP Assignment

You will be detecting wine fraud using Neural Networks as a part of this assignment. You will use the latest version of SciKit Learn (>0.18). Use the wine data set from the UCI Machine Learning Repository. Import the dataset, split the data, and use the predict () method to get predictions. You will have to train your model using Scikit Learn’s estimator objects.

AI and Deep Learning Intro Assignment

For this assignment, you will need to install Anaconda on your system with Python version 3.6 or above. Create a TensorFlow environment, download TensorFlow, and download Pandas, Numpy, SciKit-learn, SciPy, Matplotlib in both Anaconda and TensorFlow environment. You will also need to install Keras and TFLearn in the TensorFlow environment.

RNN Assignment

As part of the assignment, you will be using an airline-passenger dataset to predict the number of passengers for a particular month. Write a simple function to convert a single column of data into a two-column dataset. You will divide the data into train and test set.

TensorFlow Assignment

Through this assignment, you will learn to create a session in TensorFlow. You will define constants and perform computations using the session, print ‘Hello World’ using the same, and create a simple Linear Equation, y=mx+c in Tensorflow, where m and c are variables and x is a placeholder.

TFLearn Assignment

In this assignment, you will be required to find out the factors that lead up to a patient having cancer. You will need to load the dataset and print the number of samples and features in the data. Then, you will divide the data into train & and create a network.

Introduction to NoSQL and MongoDB

RDBMS, types of relational databases, challenges of RDBMS, NoSQL database, its significance, how NoSQL suits Big Data needs, introduction to MongoDB and its advantages, MongoDB installation, JSON features, data types and examples

MongoDB Installation

Installing MongoDB, basic MongoDB commands and operations, MongoChef (MongoGUI) installation and MongoDB data types

Hands-on Exercise: Install MongoDB and install MongoChef (MongoGUI)

Importance of NoSQL

The need for NoSQL, types of NoSQL databases, OLTP, OLAP, limitations of RDBMS, ACID properties, CAP Theorem, Base property, learning about JSON/BSON, database collection and documentation, MongoDB uses, MongoDB write concern—acknowledged, replica acknowledged, unacknowledged, journaled—and Fsync

Hands-on Exercise: Write a JSON document

CRUD Operations

Understanding CRUD and its functionality, CRUD concepts, MongoDB query and syntax and read and write queries and query optimization

Hands-on Exercise:Use insert query to create a data entry, use find query to read data, use update and replace queries to update and use delete query operations on a DB file

Data Modeling and Schema Design

Concepts of data modelling, difference between MongoDB and RDBMS modelling, model tree structure, operational strategies, monitoring and backup

Hands-on Exercise: Write a data model tree structure for a family hierarchy

Data Management and Administration

In this module, you will learn MongoDB® Administration activities such as health check, backup, recovery, database sharding and profiling, data import/export, performance tuning, etc.

Hands-on Exercise: Use shard key and hashed shard keys, perform backup and recovery of a dummy dataset, import data from a CSV file and export data to a CSV file

Data Indexing and Aggregation

Concepts of data aggregation and types and data indexing concepts, properties and variations

Hands-on Exercise: Do aggregation using pipeline, sort, skip and limit and create index on data using single key and using multi-key

MongoDB Security

Understanding database security risks, MongoDB security concept and security approach and MongoDB integration with Java and Robomongo

Hands-on Exercise: MongoDB integration with Java and Robomongo

Working with Unstructured Data

Implementing techniques to work with variety of unstructured data like images, videos, log data and others and understanding GridFS MongoDB file system for storing data

Hands-on Exercise: Work with variety of unstructured data like images, videos, log data and others

What projects I will be working on this MongoDB training?

Project: Working with the MongoDB Java Driver

Industry: General

Problem Statement: How to create table for video insertion using Java

Topics: In this project, you will work with MongoDB Java Driver and become proficient in creating a table for inserting video using Java programming. You will work with collections and documents and understand the read and write basics of MongoDB database and the Java virtual machine libraries.

Highlights:

  • Setting up MongoDB JDBC Driver
  • Connecting to the database
  • Java virtual machine libraries

Module 01 - Introduction to Microsoft Azure

1.1 Introduction to cloud computing
1.2 What is Microsoft Azure?
1.3 Microsoft Azure Services
1.4 Creating a Microsoft Azure Account
1.5 Azure CLI, Azure Powershell
1.6 Managing Azure Resources & Subscriptions
1.7 Azure Resource Manager
1.8 Microsoft Azure Architecture

Hands-on Exercise:

1. Creating a Microsoft Azure account
2. Configuring Azure PowerShell
3. Configuring Azure CLI

Module 02 - Introduction to ARM & Azure Storage

2.1 Azure Resources & Subscriptions
2.2 Azure Resource Manager
2.3 Managing Azure Resources
2.4 Azure Tags
2.5 Azure Storage Account & its types
2.6 Azure Blob Storage
2.7 Azure Content Delivery Network (CDN)
2.8 Azure Files Storage
2.9 Azure File Sync

Hands-on Exercise: 

1. Manage Resource Groups in Azure
2. Move resource from one resource group to another
3. Apply tags
4. Create storage account
5. Access storage account
6. Create blob storage
7. Upload in blob storage
8. Create a file share
9. Creating and using CDN Endpoint

Module 03 - Introduction to Azure storage

3.1 Azure Table Storage
3.2 Azure Queue Storage
3.3 Azure Storage Explorer
3.4 Azure Shared Access Signature (SAS)
3.5 Azure Databox
3.6 Azure Storage Replication
3.7 Data Replication Options
3.8 Azure Import/Export Service

Hands-on Exercise:

1. Attach & Detach an External Storage Account
2. Storage explorer – Blob, file
3. queues and table storage
4. Backup-archive
5. Backup – Snapshots
6. Backup – AZCopy
7. Azure Shared Access Signature (SAS)
8. use Azure Data Factory Copy Data tool to transfer data to Azure

Module 04 - Azure Virtual Machines

4.1 Azure Virtual Machines
4.2 Data Disks in Azure
4.3 Azure VMs & Interfaces
4.4 ARM templates
4.5 VHD templates
4.6 Custom Images of Azure VM
4.7 Virtual Machine Scale Sets
4.8 Virtual Machine Availability Sets

Hands-on Exercise: 

1. Creating and Configuring An Azure VM
2. Deploying a custom image of Azure VM
3. Virtual Machine Scale Sets.

Module 05 - Azure App and Container services

5.1 App Service Web App for Containers
5.2 App Service plan
5.3 Networking for an App Service
5.4 Deployment slots
5.5 Container image
5.6 Azure Kubernetes Service
5.7 Azure Container Registry

Hands-on Exercise:

1. Create an App Service Web App for Containers
2. Create a container image
3. configure Azure Kubernetes Service
4. publish and automate image deployment to the Azure Container Registry

Module 06 - Azure Networking - I

6.1 Azure Virtual Networks
6.2 Azure Vnet Components
6.3 IP Address – Public & Private IPs
6.4 Azure Vnet Subnets
6.5 Azure Network Interface Cards (NIC)
6.6 Network Security Group (NSG)
6.7 Route Tables
6.8 Service Tags
6.9 Azure DNS
6.10 Private DNS

Hands-on Exercise: 

1. Vnet creation
2. Create and configure vnet-vnet peering
3. Verify virtual network connectivity
4. Assign static IP to VM
5. Create route tables
6. Add routes
7. Create NIC
8. Attach NIC to VM
9. Create DNS
10. Add RecordSet
11. Create NSG
12. Add security rule to NSG
13. Attach NSG to subnet
14. Verify NSG is applied

Module 07 - Azure Networking - II

7.1 Application Gateway
7.2 Azure Front Door Service
7.3 Azure Traffic Manager
7.4 Application Security Groups
7.5 Azure Load Balancers
7.6 Azure Firewall
7.7 Azure Bastion
7.8 Network Watcher
7.9 Azure Express Route
7.10 Express Route Circuits
7.11 Express Route Peering

Hands-on Exercise:

1. Create internal load balancer
2. Create Public load balancer
3. Application Gateway
4. Implement the Azure Front Door Service
5. implement Azure Traffic Manager
6. Deploy and configure Azure Bastion Service

Module 08 - Authentication and Authorization in Azure using RBAC

8.1 Identity and Access Management in Azure
8.2 Role Based Access Management (RBAC)
8.3 Role Definitions
8.4 Role Assignment in Azure Resources
8.5 Azure Users & Groups
8.6 RBAC Policies

Hands-on Exercise:

1. Create a custom role for Azure Resources
2. Assign a role to configure access to Azure resources

Module 09 - Microsoft Azure Active Directory

9.1 Azure Active Directory (Azure AD)
9.2 Windows AD Vs Azure AD
9.3 Azure AD Users
9.4 Azure AD Groups
9.5 Azure AD Domains
9.6 Azure AD Tenants
9.7 Authentication Options
9.8 Azure AD Connect
9.9 Self Service password Reset (SSPR)
9.10 Multi Factor Authentication (MFA)
9.11 Resource Locks

Hands-on Exercise:

1. Add or delete users using Azure Active Directory
2. Add or delete tenants using Azure Active Directory
3. Create a basic group and add members
4. Applying Resource Locks

Module 10 - Azure Monitoring

10.1 Azure Monitor
10.2 Azure Metrics
10.3 Log Analytics
10.4 Alerts and actions
10.5 Application Insights
10.6 Backup reports
10.7 Recovery Services Vault
10.8 Backing Up Azure Virtual Machines
10.9 VM Backup Policies
10.10 Restoring Azure Virtual machines,

Hands-on Exercise:

1. configure and interpret azure metrics
2. configure Log Analytics
3. query and analyse logs
4. set up alerts and actions
5. create a Recovery Services Vault
6. Backing up and restoring a Virtual Machine

What projects I will be working on this Microsoft Azure 104 training?

Project 1: 

Problem: Implementing a new architecture to the company’s website based on the requirements given for application gateway, storage accounts and configuring traffic manager for the same

Topics covered

  • Azure gateway
  • Azure storage accounts
  • Traffic manager
  • Azure networking
  • Azure blob storage
  • Azure containers

Project highlights:

  • Working and configuring application gateway
  • Configuring storage account to technical specifications
  • Working with blob storage
  • Vnet-vnet peering
  • Distributing traffic across different regions

Project 2: building a dashboard to monitor your company’s website which is running on a web app.

Topics covered:

  • Azure metrics
  • Log analytics
  • Application insights
  • Alerts and actions
  • Azure monitor

Project highlights:

  • Visually correlating trends among various metrics
  • investigate spikes and dips in metric values.
  • Creating a common dashboard for various metrics
  • Raising alerts and performing relevant actions on specified conditions

Case Study 01: Introduction to Cloud computing

Problem Statement: Solving the issue of not wanting the corporation’s confidential data on the cloud while migrating to Microsoft Azure

Topics: Azure Resource manager, Azure subscriptions

Highlights:
1.1 Govern all resources separately
1.2 Tracking cost and billing of each service being used separately
1.3 Accessing and managing resource groups

Case Study 02: Microsoft Azure Storage

Problem Statement: Solving latency issues and difficulty accessing common files and tools

Topics: Azure storage account, Azure file share, CDN endpoint

Highlights:
2.1 Uploading static content to azure storage
2.2 Creating and configuring a CDN Endpoint to serve the static files that have
been uploaded
2.3 Creating an azure file share and uploading content in it
2.4 Connecting a Linux and Windows server to the File share

Case Study 03: Azure Virtual Machines

Problem Statement: Managing scaling requirements using scale sets and using custom image to create a Virtual machine

Topics: Virtual Machines, Custom images

Highlights:
3.1 Automating the scaling of Virtual Machines as required
3.2 Deploying multiple identical VMs using custom VM image

Case Study 04: Microsoft Azure networking

Problem Statement: Deploy a virtual network with multiple subnets in it and enable the resources within them to communicate privately

Topics: Virtual network, Vnet peering

Highlights:
4.1 Creating a Vnet with subnets and deploying Virtual Machines in it
4.2 Establishing a connection between these subnets

Case Study 05: Load balancing and Network watcher

Problem Statement: Setting up a load balancer and a network watcher in Azure portal

Topics: Azure load balancer, Network performance monitor

Highlights:
5.1 Deploying a load balancer for the backend resources such that a single frontend IP is exposed and all the web servers can be accessed from it
5.2 Setting up a Network performance manager to generate alerts

Case Study 06: Access management in Azure

Problem Statement: Providing access to some of the services managed by your organization’s active directory

Topics: Azure Active Directory, Azure Multi-Factor Authentication

Highlights:
6.1 Adding users in active directory and giving them access
6.2 Creating users in custom active directory domain and giving them access
6.3 Setting up a password authentication method
6.4 Setting up MFA with a verification option

Module 01 - Introduction to Cloud Computing & AWS

1.1 What is Cloud Computing
1.2 Cloud Service & Deployment Models
1.3 How AWS is the leader in the cloud domain
1.4 Various cloud computing products offered by AWS
1.5 Introduction to AWS S3, EC2, VPC, EBS, ELB, AMI
1.6 AWS architecture and the AWS Management Console, virtualization in AWS (Xen hypervisor)
1.7 What is auto-scaling
1.8 AWS EC2 best practices and cost involved.

Hands-on Exercise – Setting up of AWS account, how to launch an EC2 instance, the process of hosting a website and launching a Linux Virtual Machine using an AWS EC2 instance.

Module 02 - Elastic Compute and Storage Volumes

2.1 Introduction to EC2
2.2 Regions & Availability Zones(AZs)
2.3 Pre-EC2, EC2 instance types
2.4 Comparing Public IP and Elastic IP
2.5 Demonstrating how to launch an AWS EC2 instance
2.6 Introduction to AMIs, Creating and Copying an AMI
2.7 Introduction to EBS
2.8 EBS volume types
2.9 EBS Snapshots
2.10 Introduction to EFS
2.11 Instance tenancy- Reserved and Spot instances
2.12 Pricing and Design Patterns.

Hands-on Exercise –
1. Launching an EC2 instance
2. Creating an AMI of the launched instance
3. Copying the AMI to another region
4. Creating an EBS volume
5. Attaching the EBS volume with an instance
6. Taking backup of an EBS volume
7. Creating an EFS volume and mounting the EFS volume to two instances.

Module 03 - Load Balancing, Autoscaling and DNS

3.1 Introduction to Elastic Load Balancer
3.2 Types of ELB – Classic, Network and Application
3.3 Load balancer architecture
3.4 Cross-zone load balancing
3.5 Introduction to Auto Scaling, vertical and horizontal scaling, the lifecycle of Auto Scaling
3.6 Components of Auto Scaling, scaling options and policy, instance termination
3.7 Using load balancer with Auto Scaling
3.8 Pre-Route 53 – how DNS works
3.9 Routing policy, Route 53 terminologies, Pricing.

Hands-on Exercise –
1. Creating a Classic ELB
2. Creating an Application ELB
3. Creating an auto-scaling group
4. Configuring an auto-scaling group
5. Integrating ELB with Auto Scaling
6. Redirect traffic from domain name to ELB using Route 53.

Module 04 - Virtual Private Cloud

4.1 What is Amazon VPC,
4.2 VPC as a networking layer for EC2,
4.3 IP address and CIDR notations,
4.4 Components of VPC – network interfaces, route tables, internet gateway, NAT,
4.5 Security in VPC – security groups and NACL, types of VPC, what is a subnet, VPC peering with scenarios, VPC endpoints, VPC pricing and design patterns.
Hands-on Exercise –
1. Creating a VPC and subnets,
2. Creating a 3 Tier architecture with security groups,
3. NACL, Internet gateway and NAT gateway,
4. Creating a complete VPC architecture.

Module 05 - Storage - Simple Storage Service (S3)

5.1 Introduction to AWS storage
5.2 Pre-S3 – online cloud storage
5.3 API, S3 consistency models
5.4 Storage hierarchy, buckets in S3
5.5 Objects in S3, metadata and storage classes, object versioning, object lifecycle management, cross-region replication, data encryption, connecting using VPC endpoint, S3 pricing.

Hands-on Exercise –
1. Creating an S3 bucket
2. Uploading objects to the S3 bucket
3. Enabling object versioning in the S3 bucket
4. Setting up lifecycle management for only a few objects
5. Setting up lifecycle management for all objects with the same tag
6. Static website hosting using S3.

Module 06 - Databases and In-Memory DataStores

6.1 What is a database, types of databases, databases on AWS
6.2 Introduction to Amazon RDS
6.3 Multi-AZ deployments, features of RDS
6.4 Read replicas in RDS, reserved DB instances
6.5 RDS pricing and design patterns
6.6 Introduction to Amazon Aurora, benefits of Aurora, Aurora pricing and design patterns
6.7 Introduction to DynamoDB, components of DynamoDB, DynamoDB pricing and design patterns
6.8 What is Amazon Redshift, advantages of Redshift
6.9 What is ElastiCache, why ElastiCache.

Hands-on Exercise –
1. Launching a MySQL RDS instance
2. Modifying an RDS instance
3. Connecting to the DB instance from your machine
4. Creating a multi-az deployment
5. Create an Aurora DB cluster
6. Creating an Aurora replica
7. Creating a DynamoDB table.

Module 07 - Management and Application Services

7.1 Introduction to CloudFormation
7.2 CloudFormation components
7.3 CloudFormation templates
7.4 The concept of Infrastructure-as-a-code
7.5 Functions and pseudo parameters
7.6 Introduction to Simple Notification Service, how does SNS work
7.7 Introduction to Simple Email Service, how does SES work
7.8 Introduction to Simple Queue Service, how does SQS work.

Hands-on Exercise –
1. Creating a CloudFormation stack
2. Launching a t2.micro
3. EC2 instance using CloudFormation
4. Using CloudFormation to automate an architectural deployment
5. Creating an SNS topic, creating a subscription within the topic
6. Setting up SES and sending a mail
7. Creating an SQS queue and sending a sample message.

Module 08 - Access Management and Monitoring Services

8.1 Pre-IAM, why access management
8.2 Amazon Resource Name (ARN), IAM features
8.3 Multi-factor authentication (MFA) in IAM, JSON
8.4 IAM policies, IAM permissions, IAM roles, identity federation, pricing
8.5 Introduction to CloudWatch, metrics and namespaces, CloudWatch architecture, dashboards in CW, CloudWatch alarms, CloudWatch logs, pricing and design patterns
8.6 Introduction to CloudTrail, tracking API usage.

Hands-on Exercise –
1. Creating IAM users and a group
2. creating an IAM policy and attach it to the group
3. creating an IAM role
4. Setup MFA for a user
5. Creating a CloudWatch dashboard and add metrics
6. Create a CloudWatch alarm which triggers according to CPU Utilization of an EC2 instance
7. Creating a billing alarm
8. Creating a log group
9. Creating a trail.

Module 09 - Automation and Configuration management

9.1 What is AWS Lambda
9.2 How Lambda is different from EC2
9.3 Benefits and limitations of Lambda
9.4 How does Lambda work
9.5 Use cases of Lambda, Lambda concepts
9.6 Integration S3 with Lambda
9.7 What is Elastic Beanstalk, how does Beanstalk work, Beanstalk concepts, Beanstalk pricing
9.8 What is configuration management
9.9 What is AWS OpsWorks, AWS OpsWorks benefits
9.10 CloudFormation vs OpsWorks, services in OpsWorks, AWS OpsWorks Stacks, OpsWorks pricing.

Hands-on Exercise –
1. Creating a Lambda function
2. Setting up Lambda triggers and destinations
3. Creating an Elastic Beanstalk application
4. Uploading a new version of the application to Beanstalk
5. Creating a stack in OpsWorks
6. Launching the instance using OpsWorks and automatically installing the application.

Module 10 - Amazon FSx and Global Accelerator

10.1 What is FSx
10.2 Types of FSx,FSx for Windows server
10.3 How does FSx for Windows File Server work, FSx for Lustre
10.4 Use cases of FSx
10.5 Automatic failover process
10.6 Supported clients and access methods
10.7 What is a Global Accelerator, How Global Accelerator works, Listeners and Endpoints
10.8 What are AWS Organizations, Features of AWS Organizations, Managing multiple accounts
10.9 What are ENIs, ENAs and EFAs, Working with network interfaces
10.10 Enhanced Networking with ENA, EFA with MPI, Monitoring an EFA

Hands-on Exercise:
1. Creating a shared FSx file system between two windows instances
2. Accessing one instance with multiple Elastic IPS using ENI
3. Using Global Accelerator to map instances from 2 regions into one domain name
4. Enabling Enhanced Networking on an Ubuntu instance

Self Paced

Module 11 - Architecting AWS – whitepaper

11.1 Important guidelines for creating a well-architected AWS framework that is resilient and performant
11.2 Designing of fault-tolerant and high-availability architecture
11.3 Resilient storage
11.4 Decoupling mechanism
11.5 Multi-tier architecture solution
11.6 Disaster recovery solution
11.7 Scalable and elastic solutions.

Module 12 - DevOps on AWS

12.1 What is DevOps,
12.2 Introduction to AWS DevOps,
12.3 AWS Developer tools – CodeCommit, CodeBuild, CodeDeploy and CodePipeline, integrating GitHub with CodePipeline,
12.4 Creating a DevOps lifecycle using AWS DevOps tools.

Module 13 - AWS Migration

13.1 What is Cloud migration
13.2 Why migration is important
13.3 Migration process in AWS, the 6 R’s migration strategy
13.4 Virtual machine migration, migrating a local vm onto the AWS cloud
13.5 Migrating databases using Database Migration Service (DMS)
13.6 Migrating a local database to RDS
13.7 Migrating an on-premises database server to RDS using DMS, other migration services.

Module 14 - AWS Architect Interview Questions

14.1 Guidance for clearing the exam, most probable interview questions and other helpful tips for clearing the exam and interview.

AWS Projects Covered

Deploying a Multi-Tier Website on AWS

Using various AWS services such as EC2, ELB, Auto Scaling, VPC, etc. to create a highly available and reliable architecture to host a PHP website. Furthermore, use SNS for sending mails of all your websites operations on AWS and deploy the application in a private subnet & use ELB to expose it. Prevent the website from crashing by dynamically scaling your servers.

Deploying a Website for High Availability and High Resilience

An architecture which should be designed to be highly available. Based on the applications workload, the architecture should automatically scale its servers up and down. To balance the load across all these servers, using a ELB is must and also the architecture should be decoupled to connect an RDS database with an Elastic Beanstalk environment.

Sending Notifications to patients using push notifications

Design an architecture to send notifications to patients based on their doctors’ feedback. Using SNS for sending messages will increase the reliability and resilience. Integrate EC2 with the SNS topic for message storing and by using Public and Private subnets we will have to secure the EC2 instances.

Application to sort objects in an S3 bucket using Beanstalk and Lambda

Uploading an application which could upload objects to an S3 bucket to Elastic Beanstalk. Set up your Lambda functions trigger as Object creation in the S3 bucket to which the Beanstalk application uploads the objects to. Add your Lambda code which will segregate the uploaded objects into separate buckets according to the extension (e.g., .png, .pdf, etc).

Case Study 1 - Using Different Operations on EC2 and EWS

The case study is to replicate or copy EC2 instances to varied regions depending upon the High Availability. Furthermore, the main strategy is to extend the size of EBS volumes without losing the data. The major highlights of this case study are to scale and mount the EBS volumes to different EC2 instances one at a time.

Case Study 2 - Autoscaling Compute Capacity in AWS

The major aim of this case study is to Autoscale (scaling up and down automatically) and Load Balance among multiple EC2 instances within AWS based on varied/defined metrics for Autoscaling instances. Also, the case study deals with and routing custom domains to AWS resources.

Case Study 3 - Creating custom VPCs in AWS

In this case study, the candidate will create a custom VPC in AWS with the help of multiple subnets having both private as well as public access. The route tables are also configured to subnets using the Internet Gateway and NAT Gateway.

Case Study 4 - Using AWS S3 for Lifecycle Access Management

The case study deals with moving artifacts from on-premise to S3 in the most cost-efficient manner. Furthermore, it deals with the creation of Lifecycle rules for events in S3 objects, hosting a static website, and experimenting with the usage of route 53.

Case Study 5 - Highly available Relational Database in AWS

This case study is all about creating a highly available and scalable AWS Database Service in AWS using RDS. The process involves creating Database Architecture, collecting data for real-time analysis, and relocating the latency issues.

Case Study 6 - CloudFormation for Infrastructure as a Cloud

The case study involves provisioning and deploying AWS Resources using AWS CloudFormation. Within the process, the candidate has to define rules for deletion using IaC and also minimize the deployment time.

Case Study 7 - Administering user access using AWS IAM

In this case study, the candidate will create users in IAM for defining granular access that differs with each user. Alongside he/she will also define custom policies that add users to groups.

Case Study 8 - Application Services in AWS and Configuration Management

The main aim of this case study is to use application services in AWS Lambda for deploying code and also conduct configuration management using OpsWork. Alongside, WebApp is also deployed to Elastic Beanstalk.

HBase Overview

Getting started with HBase, core concepts of HBase and understanding HBase with an example

Architecture of NoSQL

Why HBase, where to use HBase and what is NoSQL

HBase Data Modeling

HDFS vs. HBase, HBase Use Cases and Data Modeling HBase

HBase Cluster Components

HBase Architecture and main components of HBase cluster

HBase API and Advanced Operations

HBase Shell, HBase API, primary operations and advanced operations

Integration of Hive with HBase

Create a table and insert data into it and integration of Hive with HBase and Load Utility

File Loading with Both Load Utilities

Putting Folder to VM and file loading with both load utilities

What projects I will be working on this HBase training?

Project: Integrate Hive and Java with HBase

Industry: General

Problem Statement: How to create HBase table with Hive

Topics: This is a project that gives you hands-on experience to connect Hive and Java with HBase. Hive is used for querying data using HiveQL. In this project, you will do HBase installation, create Hive tables, import the data onto Hive from HBase and use HiveQL for Hive table data querying, analyzing and managing the HBase table. You will also learn to write Java code for writing HBase queries.

Highlight:

  • Creation of HBase tables
  • Converting SQL queries into MapReduce
  • Running HBase queries with Java

Advantages and Usage of Cassandra

Introduction to Cassandra, its strengths and deployment areas

CAP Theorem and No SQL DataBase

Significance of NoSQL, RDBMS Replication, Key Challenges, types of NoSQL, benefits and drawbacks, salient features of NoSQL database. CAP Theorem, Consistency.

Cassandra fundamentals, Data model, Installation and setup

Installationintroduction to Cassandra, key concepts and deployment of non relational database, column-oriented database, Data Model – column, column family,

Cassandra Configuration

Token calculation, Configuration overview, Node tool, Validators, Comparators, Expiring column, QA

Summarization, node tool commands, cluster, Indexes, Cassandra & MapReduce, Installing Ops-center

How Cassandra modelling varies from Relational database modelling, Cassandra modelling steps, introduction to Time Series modelling, comparing Column family Vs. Super Column family, Counter column family, Partitioners, Partitioners strategies, Replication, Gossip protocols, Read operation, Consistency, Comparison

Multi Cluster setup

Creation of multi node cluster, node settings, Key and Row cache, System Key space, understanding of Read Operation, Cassandra Commands overview, VNodes, Column family

Thrift/Avro/Json/Hector Client

JSON, Hector client, AVRO, Thrift, JAVA code writing method, Hector tag

Datastax installation part,· Secondary index

Cassandra management, commands of node tool, MapReduce and Cassandra, Secondary index, Datastax Installation

Advance Modelling

Rules of Cassandra data modelling, increasing data writes, duplication, and reducing data reads, modelling data around queries, creating table for data queries

Deploying the IDE for Cassandra applications

Understanding the Java application creation methodology, learning key drivers, deploying the IDE for Cassandra applications,cluster connection and data query implementation

Cassandra Administration

Learning about Node Tool Utility, cluster management using Command Line Interface, Cassandra management and monitoring via DataStax Ops Center.

Cassandra API and Summarization and Thrift

Cassandra client connectivity, connection pool internals, API, important features and concepts of Hector client, Thrift, JAVA code, Summarization.

What projects I will be working on this Cassandra training?

Type : Deploying the IDE for Cassandra applications

Topics : This project gives you a hands-on experience in installing and working with Apache Cassandra which is a high performance and extremely scalable database for distributed data with no single point of failure. You will deploy the Java Integrated Development Environment for running Cassandra, learn about the key drivers, work with Cassandra applications in a cluster setup and implement data querying techniques.

Introduction to Couchbase

The architecture of Couchbase, understanding Couchbase distributed NoSQL database engine, vBuckets for information distribution on Couchbase cluster, user and system requirements and Couchbase downloading and installation

Single-node Implementation

Couchbase single-node deployment for development purpose

Couchbase Web Console

Managing the Couchbase environment with the Web Console tool, configuring the Couchbase server and management and working with Couchbase data buckets, default bucket sizing and administration

Couchbase Multi-node Cluster

Methods for deploying Couchbase in multi-node cluster: first, all Couchbase Servers on one machine and, second, with each Couchbase Server on own machine

Couchbase Command-line Interface

The Couchbase Command-line Interface tools for managing and monitoring single-node and multi-node clusters, Severs and vBuckets and developing reports for log data collection

What projects I will be working on this Couchbase training?

Topics: This project involves working with the Couchbase command-line interface tools that are used for managing of clusters in a multi-node or single-node setup, working with vBuckets in Couchbase Server, deploying reports for log data collection. You will gain hands-on experience in deploying commands like start, stop and report status for log collection. It also includes working with Couchbase-cli, cbcollect_info tool and so on. Upon the completion of the project, you will be proficient in using Couchbase CLI for managing and monitoring clusters and data replication using XDCR.

Module 01 - Introduction to Machine Learning

1.1 Need of Machine Learning
1.2 Introduction to Machine Learning
1.3 Types of Machine Learning, such as supervised, unsupervised, and reinforcement learning, Machine Learning with Python, and the applications of Machine Learning

Module 02 - Supervised Learning and Linear Regression

2.1 Introduction to supervised learning and the types of supervised learning, such as regression and classification
2.2 Introduction to regression
2.3 Simple linear regression
2.4 Multiple linear regression and assumptions in linear regression
2.5 Math behind linear regression

Hands-on Exercise:

1. Implementing linear regression from scratch with Python
2. Using Python library Scikit-Learn to perform simple linear regression and multiple linear regression
3. Implementing train–test split and predicting the values on the test set

Module 03 - Classification and Logistic Regression

3.1 Introduction to classification
3.2 Linear regression vs logistic regression
3.3 Math behind logistic regression, detailed formulas, the logit function and odds, confusion matrix and accuracy, true positive rate, false positive rate, and threshold evaluation with ROCR

Hands-on Exercise:

1. Implementing logistic regression from scratch with Python
2. Using Python library Scikit-Learn to perform simple logistic regression and multiple logistic regression
3. Building a confusion matrix to find out accuracy, true positive rate, and false positive rate

Module 04 - Decision Tree and Random Forest

4.1 Introduction to tree-based classification
4.2 Understanding a decision tree, impurity function, entropy, and understanding the concept of information gain for the right split of node
4.3 Understanding the concepts of information gain, impurity function, Gini index, overfitting, pruning, pre-pruning, post-pruning, and cost-complexity pruning
4.4 Introduction to ensemble techniques, bagging, and random forests and finding out the right number of trees required in a random forest

Hands-on Exercise:

1. Implementing a decision tree from scratch in Python
2. Using Python library Scikit-Learn to build a decision tree and a random forest
3. Visualizing the tree and changing the hyper-parameters in the random forest

Module 05 - Naïve Bayes and Support Vector Machine (self-paced)

5.1 Introduction to probabilistic classifiers
5.2 Understanding Naïve Bayes and math behind the Bayes theorem
5.3 Understanding a support vector machine (SVM)
5.4 Kernel functions in SVM and math behind SVM

Hands-on Exercise:

1. Using Python library Scikit-Learn to build a Naïve Bayes classifier and a support vector classifier

Module 06 - Unsupervised Learning

6.1 Types of unsupervised learning, such as clustering and dimensionality reduction, and the types of clustering
6.2 Introduction to k-means clustering
6.3 Math behind k-means
6.4 Dimensionality reduction with PCA

Hands-on Exercise:

1. Using Python library Scikit-Learn to implement k-means clustering
2. Implementing PCA (principal component analysis) on top of a dataset

Module 07 - Natural Language Processing and Text Mining (self-paced)

7.1 Introduction to Natural Language Processing (NLP)
7.2 Introduction to text mining
7.3 Importance and applications of text mining
7.4 How NPL works with text mining
7.5 Writing and reading to word files
7.6 Language Toolkit (NLTK) environment
7.7 Text mining: Its cleaning, pre-processing, and text classification

Hands-on Exercise:

1. Learning Natural Language Toolkit and NLTK Corpora
2. Reading and writing .txt files from/to a local drive
3. Reading and writing .docx files from/to a local drive

Module 08 - Introduction to Deep Learning

8.1 Introduction to Deep Learning with neural networks
8.2 Biological neural networks vs artificial neural networks
8.3 Understanding perception learning algorithm, introduction to Deep Learning frameworks, and TensorFlow constants, variables, and place-holders

Module 09 - Time Series Analysis (self-paced)

9.1 What is time series? Its techniques and applications
9.2 Time series components
9.3 Moving average, smoothing techniques, and exponential smoothing
9.4 Univariate time series models
9.5 Multivariate time series analysis
9.6 ARIMA model and time series in Python
9.7 Sentiment analysis in Python (Twitter sentiment analysis) and text analysis

Hands-on Exercise:

1. Analyzing time series data
2. The sequence of measurements that follow a non-random order to recognize the nature of the phenomenon
3. Forecasting the future values in the series

Machine Learning Projects

Analyzing the Trends of COVID-19 with Python

In this project, you will be using Pandas to accumulate data from multiple data files, Plotly to create interactive visualizations, Facebook’s Prophet library to make time series models, and visualizing the prediction by combining these technologies.

Customer Churn Classification

This project will help you get more familiar with Machine Learning algorithms. You will be manipulating data to gain meaningful insights, visualizing data to figure out trends and patterns among different factors, and implementing algorithms like linear regression, decision tree, and Naïve Bayes.

Creating a Recommendation System for Movies

You will be creating a Recommendation system for movies by working with Rating prediction, item prediction, user-based methods in k-nearest neighbor, matrix factorization, decomposition of singular value, collaboration filtering, business variables overview, etc. Two approaches you will use are memory-based and model-based.

Case Study 1 - Decision Tree

Conducting this case study will help you understand the structure of a dataset (PIMA Indians Diabetes database) and create a decision tree model based on it by making use of Scikit-Learn.

Case Study 2 - Insurance Cost Prediction (Linear Regression)

In this case study, you will understand the structure of a medical insurance dataset, implement both simple and multiple linear regressions, and predict values for the insurance cost.

Case Study 3 - Diabetes Classification (Logistic Regression)

Through this case study, you will come to understand the structure of a dataset (PIMA Indians Diabetes dataset), implement multiple logistic regressions and classify, fit your model on the test and train data for prediction, evaluate your model using confusion matrix, and then visualize it.

Case Study 4 - Random Forest

You will be creating a model that would help in classifications of patients in the following ways: ‘is normal,’ ‘is suspected to have a disease,’ or in actuality ‘has the disease’ with the help of the ‘Cardiotocography’ dataset.

Case Study 5 - Principal Component Analysis (PCA)

As part of the case study, you will read the sample Iris dataset. You will use PCA to figure out the number of most important principal features and reduce the number of features using PCA. You will have to train and test the random forest classifier algorithm to check the model performance. Find the optimal number of dimensions that will give good quality results and predict accurately.

Case Study 6 - K-means Clustering

This case study involves data analysis, column extraction from the dataset, data visualization, using the elbow method to find out the appropriate number of groups or clusters for the data to be segmented, using k-means clustering, segmenting the data into k groups, visualizing a scatter plot of clusters, and many more.

Fundamentals of Search Engine and Apache Lucene

Introduction to the search engine, the Apache Lucene, understanding the inverted index, documents and fields & documents.

Analyzers in Lucene

Introduction to the various query types available in Lucene and clear understanding of these.

Exploring Apache Lucene

Understanding the prerequisites for using Apache Lucene, learning about the querying process, analyzers, scoring boosting, faceting, grouping, highlighting, the various types of geographical and spatial searches, introduction to Apache Tika.

Apache Lucene Demonstration

Demonstration of the Apache Lucene workings.

Apache Lucene advanced

Understanding the Analyzer, Query Parser in Apache Lucene, Query Object, Stopword.

Advance topics of Apache Lucene (practical)

Understanding the various aspects of Apache Lucene like Scoring, Boosting, Highlighting, Faceting and Grouping

Apache Solr

Introduction to Apache Solr, the advantages of Apache Solr over Apache Lucene, the basic system requirements for using Apache Solr, introduction to Cores in Apache Solr.

Apache Solr Indexing

Introduction to the Apache Solr indexing, index using built-in data import handler and post tool, understanding the Solrj Client and configuration of Solrj Client.

Solr Indexing continued

Demonstrating the Book Store use cases with Solr Indexing with practical examples, learning to build Schema, the field, field types, CopyField and Dynamic Field, understanding how to add, explore, update, and delete using Solrj.

Apache Solr Searching

The various aspects of Apache Solr search like sorting, pagination, an overview of the request parameters, faceting and highlighting.

Deep dive into Apache Solr

Understanding the Request Handlers, defining and mapping to search components, highlighting and faceting, updating managed schemas, request parameters hardwiring, adding fields to default search, the various types of Analyzers, Parsers, Tokenizers.

Apache Solr continued

Grouping of results in Apache Solr, Parse queries functions, fuzzy query in Apache Solr.

Extended Features

The extended features in Apache Solr, learning about Pseudo-fields, Pseudo-Joins, Spell Check, suggestions, Geospatial Search, multi-language search, stop words and synonyms.

Multicore

Understanding the concept of Multicore in Solr, the creation of Multicore in Solr, the need of Multicore, Joining of data, Replication and Ping Handler.

Administration & SolrCloud

Understanding the SolrCloud, the concept of Sharding, indexing, and replication in Apache SolrCloud, the working of Apache SolrCloud, distributed requests, reading and writng slide fault tolerance, cluster coordination using Apache ZooKeeper.

What projects I will be working on this Apache Solr training?

Project – Designing and developing simple ecommerce search engine database

Topics : This project involves deploying Apache Solr for search engine database design. You will be exposed to real world scenarios in the eCommerce domains like searching customer history, purchase order, personalization, targeted marketing opportunities and more using Solr. Get hands-on experience in Solr Multi Core, Creation Core, Solrj, Stop words, Auto complete, Auto Suggest, Synonyms, and Managed schema. Upon completion of project you will understand about distributed indexing, and other salient features of Solr like resilience, scalability, replication, reliability, centralized configuration and recovery.

Introduction to Linux

Introduction to Linux, Basics of Shell, Basics of Kernel, CentOS 8 installation and VBox additions, Basic Linux Commands, ECHO and EXPR command, Set and unset a variable, Header of a shell script (#!).

Hands-on Exercise – Executing basic Linux commands, Installing CentOS 8 on VirtualBox and adding guest additions to the installed OS.

File Management

Text editors and file creation; Users, Groups and Processes; Root and Linux file hierarchy, Understanding file hierarchy, Understanding file permissions, chmod and chown commands, the LS command, Metacharacters, Editing a file using VIM, Displaying contents of a file, Copy, Move and Remove files.

Hands-on Exercise – Using VIM, Creating users and groups, Creating files and directories, Assigning file permissions and ownership using chmod and chown, Editing files in VIM.

Files and Processes

Everything is a file in UNIX/Linux (files, directories, executables, processes), Process control commands (ps and kill), other process control tools (top, nice, renice).

Hands-on Exercise – Executing ps and kill commands on running services, Monitoring the OS using top.

Introduction to Shell Scripting

What is shell scripting, Types of shell, Creating and writing a shell script, Changing the permission of the shell script, Executing the script, Environment variables, Defining a local and a global variable, User input in a shell script.

Hands-on Exercise – Creating a shell script, Writing and executing the shell script, creating a local and a global variable, taking input from the user in a shell script.

Conditional, Looping statements and Functions

What are Conditional statements, Using IF, IF-ELSE, Nested IF statements, What are Looping statements, Using WHILE, UNTIL and FOR statements, Using the case…esac statement, What is a Function, Creating a function in Linux, Calling functions.

Hands-on Exercise – Executing IF, IF-ELSE, Nested IF statements, Executing WHILE, UNTIL and FOR statements, Executing the case…..esac statement, creating a function in multiple ways, calling a function in a file, calling a function from another file.

Text Processing

Using GREP command, Using SED command, Using AWK command, Mounting a file to the virtual box, Creating a shared folder (mounting a folder), Using SORT command and Using pipes to combine multiple Commands.

Hands-on Exercise – Executing commands using GREP, Executing commands using SED, Executing commands using AWK, Mounting a folder in the Windows OS to the Linux OS, Installing VirtualBox guest additions on CentOS 8, Extracting zipped files.

Scheduling Tasks

What are Daemons, Introduction to Task scheduling in Linux, Scheduling a job in Linux, What is Cron and Crontab, How to use cron, Using the AT command.

Hands-on Exercise – Starting, Stopping and Restarting Daemon processes, Scheduling jobs using cron and crontab, Scheduling a one time task using AT, Managing scheduled tasks using ATQ and ATRM.

Advanced Shell Scripting

Why monitoring, Introduction to process monitoring, Top vs HTop, What does PGREP do, Introduction to file and folder monitoring, Monitoring tool inotifywait, inotifywait options for folder monitoring, Events of a folder in inotify, the FREE command.

Hands-on Exercise – Using Top to moniter the OS, Installing Htop, Using Htop to monitor the OS, Filtering and sorting using Htop, Installing inotify tools, monitoring a folder using inotifywait, monitoring a folder only for certaing events, using the FREE command.

Database Connectivity

Installing and configuring MySQL, Securing MySQL, Running Queries from terminal, Running Queries from a shell script.

Hands-on Exercise – Downloading and installing MySQL, Connecting to MySQL from terminal, Querying directly from the terminal, Pushing the query result inside a file, CRUD operations from a shell script.

Linux Networking

What is networking in Linux, Why do we need networking, Using networking commands – IFCONFIG, PING, Wget and cURL, SSH, SCP and FTP, Learning Firewall tools – iptables and firewalld, DNS and Resolving IP address, /etc/hosts and /etc/hostname, nslookup and dig.

Hands-on Exercise – Executing all the networking commands, Using iptables and firewalld, Adding and removing ports, Resolving IP address in /etc/hosts, looking into a websites IP and nameservers using nslookup and dig.

What projects I will be working on this Linux Admin training?

Project: Installing WordPress on Centos7

Industry: Internet related

Problem Statement: How to install LAMP stack on Centos7 and creating a database for WordPress

Topics: In this project you will be working on creating your account on WordPress (with Database), then flush it using Flush Privileges and Install a PHP Module. We can get that package directly from CentOS’s default repositories using yumand also we will install and update the WordPress for the latest Template & Formats.

Highlight

  • Centos server installation
  • Creating a MySQL database
  • WordPress installation & configuration

Core Java Concepts

Introduction to Java Programming, Defining Java, Need for Java, Platform Independent in Java, Define JRE,JVM, JDK, Important Features and Evolution of Java

Writing Java Programs using Java Principles

Overview of Coding basics, Setting up the required environment, Knowing the available IDEs, Writing a Basic-level Java Program, Define Package, What are Java Comments?, Understanding the concept of Reserved Words, Introduction to Java Statements, What are Blocks in Java, Explain a Class, Different Methods

Language Conceptuals

Overview of the Language, Defining Identifiers, What are Constraints and Variables, What is an Encoding Set?, Concept of Separators, Define Primitives, How to make Primitive Conversions?, Various Operators in Java

Operating with Java Statements

Module Overview, Learn how to write If Statement, Understanding While Statement, Working with Do-while Statement, How to use For Statement?, Using Break Statement, What is Continue Statement, Working of Switch Statement

Concept of Objects and Classes

General Review of the Module, Defining Object and Classes in Java, What are Encapsulation, Static Members and Access Control?, Use and importance of ‘this’ Keyword, Deining Method Overloading with an example, ‘By Value’ vs. ‘By Reference’, Loading, Defining Initialization and Linking, How to Compare Objects in Java?, What is Garbage Collector?

Introduction to Core Classes

General Review, Concept of Object in Java, Define Core Class, What is System?, Explain String Classes, How do Arrays work?, Concept of Boxing & Unboxing, Use of ‘varargs’, ‘format’ and ‘printf’ Methods

Inheritance in Java

Introduction, Define Inheritance with an example, Accessibility concept, Method Overriding, Learning how to call a Superclass’ Constructor, What is Type Casting?, Familiarity with ’instanceof’ Keyword

Exception Handling in Detail

Getting started with exception Handling, Defining an Exception, How to use Constructs to deal with exceptions?, Classification of exceptions, Throw Exceptions, How to create an exception class?, stack Trace analysis

Getting started with Interfaces and Abstract Classes

General Review, Defining Interface, Use and Create and Interface, Concept of Extending interfaces, How to implement multiple interfaces?, What are abstract classes?, How to create and use abstract classes?, Comparison between interface and abstract classes, Concept of Nested Classes, What are Nested Classes?, Nested Classes Types, Working of an Inner Class, What is a Local Inner Class?, Anonymous Classes in java, What is a Static Nested Class

Overview of Nested Classes

What are Nested Classes?, Types of Nested Classes, What is an Inner Class?, Understanding local inner class, Anonymous Inner Class, Nested Class – Static

Getting started with Java Threads

What is a Thread?, How to create and start a Thread?, States of a Thread, Blocking the Execution of a Thread, Concept of Sleep Thread, Understanding the priorities in a thread, Synchronisation in Java Threads, Interaction between threads

Overview of Java Collections

Introduction to Collection Framework, Preeminent Interfaces, What are Comparable and Comparator?, Working with Lists, Working with Maps, Working with Sets, Working with Queues

Understanding JDBC

Define JDBC, Different types of Drivers, How to access the drivers?, What is Connection in Java?, What is a Statement?, Explaining CRUD Operations with examples, Prepared Statement and Callable Statement

Java Generics

Overview of important topics included, Important and Frequently-Used Features, Defining Generic List, What is Generic Map in Java?, Java Generic Classes & Methods, For Loop Generic, What is Generic Wild Card?

Input/Output in Java

Brief Introduction, Learning about Input and output streams in java, Concept of byte Oriented Streams, Defining Character Oriented Streams?, Explain Object Serialisation, Input and Output Based on Channel

Getting started with Java Annotations

Introduction and Definition of Annotations, How they are useful for Java programmers?, Placements in Annotations, What are Built-in Java Annotations, Defining Custom Annotations

Reflection and its Usage

Getting started, Define Java Reflection?, What is a Class Object?, Concept of Constructors, Using Fields, Applying Methods, Implementing Annotations in Your Java Program

What projects I will be working on this Java training?

Project – Library Management System

Problem Statement – It creates library management system project which includes following functionalities:

Add book, Add Member, Issue Book, Return Book, Available Book etc.

What is Kafka – An Introduction

Understanding what is Apache Kafka, the various components and use cases of Kafka, implementing Kafka on a single node.

Multi Broker Kafka Implementation

Learning about the Kafka terminology, deploying single node Kafka with independent Zookeeper, adding replication in Kafka, working with Partitioning and Brokers, understanding Kafka consumers, the Kafka Writes terminology, various failure handling scenarios in Kafka.

Multi Node Cluster Setup

Introduction to multi node cluster setup in Kafka, the various administration commands, leadership balancing and partition rebalancing, graceful shutdown of kafka Brokers and tasks, working with the Partition Reassignment Tool, cluster expending, assigning Custom Partition, removing of a Broker and improving Replication Factor of Partitions.

Integrate Flume with Kafka

Understanding the need for Kafka Integration, successfully integrating it with Apache Flume, steps in integration of Flume with Kafka as a Source.

Kafka API

Detailed understanding of the Kafka and Flume Integration, deploying Kafka as a Sink and as a Channel, introduction to PyKafka API and setting up the PyKafka Environment.

Producers & Consumers

Connecting Kafka using PyKafka, writing your own Kafka Producers and Consumers, writing a random JSON Producer, writing a Consumer to read the messages from a topic, writing and working with a File Reader Producer, writing a Consumer to store topics data into a file.

What projects I will be working on this Kafka training?

Type : Multi Broker Kafka Implementation

Topics : In this project you will learn about the Apache Kakfa which is a platform for handling real-time data feeds. You will exclusively work with Kafka brokers, understand partitioning, Kafka consumers, the terminology used for Kafka writes and failure handling in Kafka, understand how to deploy a single node Kafka with independent Zookeeper. Upon completion of the project you will gain considerable experience in working in a real world scenario for processing streaming data within an enterprise infrastructure.

Module 1 - Introduction to SQL

1.1 Various types of databases
1.2 Introduction to Structured Query Language
1.3 Distinction between client server and file server databases
1.4 Understanding SQL Server Management Studio
1.5 SQL Table basics
1.6 Data types and functions
1.7 Transaction-SQL
1.8 Authentication for Windows
1.9 Data control language
1.10 The identification of the keywords in T-SQL, such as Drop Table

Module 2 - Database Normalization and Entity Relationship Model

2.1 Data Anomalies
2.2 Update Anomalies
2.3 Insertion Anomalies
2.4 Deletion Anomalies
2.5 Types of Dependencies
2.6 Functional Dependency
2.7 Fully functional dependency
2.8 Partial functional dependency
2.9 Transitive functional dependency
2.10 Multi-valued functional dependency
2.11 Decomposition of tables
2.12 Lossy decomposition
2.13 Lossless decomposition
2.14 What is Normalization?
2.15 First Normal Form
2.16 Second Normal Form
2.17 Third Normal Form
2.18 Boyce-Codd Normal Form(BCNF)
2.19 Fourth Normal Form
2.20 Entity-Relationship Model
2.21 Entity and Entity Set
2.22 Attributes and types of Attributes
2.23 Entity Sets
2.24 Relationship Sets
2.25 Degree of Relationship
2.26 Mapping Cardinalities, One-to-One, One-to-Many, Many-to-one, Many-to-many
2.27 Symbols used in E-R Notation

Module 3 - SQL Operators

3.1 Introduction to relational databases
3.2 Fundamental concepts of relational rows, tables, and columns
3.3 Several operators (such as logical and relational), constraints, domains, indexes, stored procedures, primary and foreign keys
3.4 Understanding group functions
3.5 The unique key

Module 4 - Working with SQL: Join, Tables, and Variables

4.1 Advanced concepts of SQL tables
4.2 SQL functions
4.3 Operators & queries
4.4 Table creation
4.5 Data retrieval from tables
4.6 Combining rows from tables using inner, outer, cross, and self joins
4.7 Deploying operators such as ‘intersect,’ ‘except,’ ‘union,’
4.8 Temporary table creation
4.9 Set operator rules
4.10 Table variables

Module 5 - Deep Dive into SQL Functions

5.1 Understanding SQL functions – what do they do?
5.2 Scalar functions
5.3 Aggregate functions
5.4 Functions that can be used on different datasets, such as numbers, characters, strings, and dates
5.5 Inline SQL functions
5.6 General functions
5.7 Duplicate functions

Module 6 - Working with Subqueries

6.1 Understanding SQL subqueries, their rules
6.2 Statements and operators with which subqueries can be used
6.3 Using the set clause to modify subqueries
6.4 Understanding different types of subqueries, such as where, select, insert, update, delete, etc.
6.5 Methods to create and view subqueries

Module 7 - SQL Views, Functions, and Stored Procedures

7.1 Learning SQL views
7.2 Methods of creating, using, altering, renaming, dropping, and modifying views
7.3 Understanding stored procedures and their key benefits
7.4 Working with stored procedures
7.5 Studying user-defined functions
7.6 Error handling

Module 8 - Deep Dive into User-defined Functions

8.1 User-defined functions
8.2 Types of UDFs, such as scalar
8.3 Inline table value
8.4 Multi-statement table
8.5 Stored procedures and when to deploy them
8.6 What is rank function?
8.7 Triggers, and when to execute triggers?

Module 9 - SQL Optimization and Performance

9.1 SQL Server Management Studio
9.2 Using pivot in MS Excel and MS SQL Server
9.3 Differentiating between Char, Varchar, and NVarchar
9.4 XL path, indexes and their creation
9.5 Records grouping, advantages, searching, sorting, modifying data
9.6 Clustered indexes creation
9.7 Use of indexes to cover queries
9.8 Common table expressions
9.9 Index guidelines

Module 10 - Managing Data with Transact-SQL

10.1 Creating Transact-SQL queries
10.2 Querying multiple tables using joins
10.3 Implementing functions and aggregating data
10.4 Modifying data
10.5 Determining the results of DDL statements on supplied tables and data
10.6 Constructing DML statements using the output statement

Module 11 - Querying Data with Advanced Transact-SQL Components

11.1 Querying data using subqueries and APPLY
11.2 Querying data using table expressions
11.3 Grouping and pivoting data using queries
11.4 Querying temporal data and non-relational data
11.5 Constructing recursive table expressions to meet business requirements
11.6 Using windowing functions to group
11.7 Rank the results of a query

Module 12 - Programming Databases Using Transact-SQL

12.1 Creating database programmability objects by using T-SQL
12.2 Implementing error handling and transactions
12.3 Implementing transaction control in conjunction with error handling in stored procedures
12.4 Implementing data types and NULL

Module 13 - Designing and Implementing Database Objects

13.1 Designing and implementing relational database schema
13.2 Designing and implementing indexes
13.3 Learning to compare between indexed and included columns
13.4 Implementing clustered index
13.5 Designing and deploying views
13.6 Column store views

Module 14 - Implementing Programmability Objects

14.1 Explaining foreign key constraints
14.2 Using T-SQL statements
14.3 Usage of Data Manipulation Language (DML)
14.4 Designing the components of stored procedures
14.5 Implementing input and output parameters
14.6 Applying error handling
14.7 Executing control logic in stored procedures
14.8 Designing trigger logic, DDL triggers, etc.

Module 15 - Managing Database Concurrency

15.1 Applying transactions
15.2 Using the transaction behavior to identify DML statements
15.3 Learning about implicit and explicit transactions
15.4 Isolation levels management
15.5 Understanding concurrency and locking behavior
15.6 Using memory-optimized tables

Module 16 - Optimizing Database Objects

16.1 Accuracy of statistics
16.2 Formulating statistics maintenance tasks
16.3 Dynamic management objects management
16.4 Identifying missing indexes
16.5 Examining and troubleshooting query plans
16.6 Consolidating the overlapping indexes
16.7 The performance management of database instances
16.8 SQL server performance monitoring

Module 17 - Advanced Topics

17.1 Correlated Subquery, Grouping Sets, Rollup, Cube

Hands-on Exercise

  1. Implementing Correlated Subqueries
  2. Using EXISTS with a Correlated subquery
  3. Using Union Query
  4. Using Grouping Set Query
  5. Using Rollup
  6. Using CUBE to generate four grouping sets
  7. Perform a partial CUBE

Module 18 - Microsoft Courses: Study Material

18.1 Performance Tuning and Optimizing SQL Databases
18.2 Querying Data with Transact-SQL

SQL Projects

Writing Complex Subqueries

In this project, you will be working with SQL subqueries and utilizing them in various scenarios. You will learn to use IN or NOT IN, ANY or ALL, EXISTS or NOT EXISTS, and other major queries. You will be required to access and manipulate datasets, operate and control statements in SQL, execute queries in SQL against databases.

Querying a Large Relational Database

This project is about how to get details about customers by querying the database. You will be working with Table basics and data types, various SQL operators, and SQL functions. The project will require you to download a database and restore it on the server, query the database for customer details and sales information.

Relational Database Design

In this project, you will learn to convert a relational design that has enlisted within its various users, user roles, user accounts, and their statuses into a table in SQL Server. You will have to define relations/attributes, primary keys, and create respective foreign keys with at least two rows in each of the tables.

View More

Free Career Counselling

Certification

This is a comprehensive course that is designed to clear multiple certifications, namely:

  • CCA Spark and Hadoop Developer (CCA175)
  • Splunk Certified Power User Certification
  • Splunk Certified Admin Certification
  • Tableau Desktop Qualified Associate Exam
  • SAS Certified Base ProgrammerExam
  • C100DEV: MongoDB Certified Developer Associate Exam
  • Apache Cassandra DataStax Certification
  • Linux Foundation Linux Certification
  • Java SE Programmer Certification

The entire training course content is in line with respective certification exams. At the end of this training course, there will be quizzes that perfectly reflect the type of questions asked in the respective certification exams and help you score better marks. Intellipaat Course Completion certificate will be awarded upon the completion of the project work (after expert review) and upon scoring at least 60% marks in the quiz. Intellipaat certification is well recognized in top 80+ MNCs like Ericsson, Cisco, Cognizant, Sony, Mu Sigma, Saint-Gobain, Standard Chartered, TCS, Genpact, Hexaware, etc.

Our Alumni works at top 3000+ companies

client-desktop client-mobile

Frequently Asked Questions

What is Intellipaat’s master's course and how is it different from individual courses?

Intellipaat’s master’s course is a structured learning path especially designed by industry experts and ensures that you transform into a Big Data and Data Science expert. Individual courses at Intellipaat focus on one or two specializations. However, if you have to master Big Data and Data Science, then this program is for you.

At Intellipaat, you can enroll in either the instructor-led online training or self-paced training. Apart from this, Intellipaat also offers corporate training for organizations to upskill their workforce. All trainers at Intellipaat have 12+ years of relevant industry experience, and they have been actively working as consultants in the same domain, which has made them subject matter experts. Go through the sample videos to check the quality of our trainers.

Intellipaat is offering the 24/7 query resolution, and you can raise a ticket with the dedicated support team at anytime. You can avail of the email support for all your queries. If your query does not get resolved through email, we can also arrange one-on-one sessions with our trainers.

You would be glad to know that you can contact Intellipaat support even after the completion of the training. We also do not put a limit on the number of tickets you can raise for query resolution and doubt clearance.

Intellipaat is offering you the most updated, relevant, and high-value real-world projects as part of the training program. This way, you can implement the learning that you have acquired in real-world industry setup. All training comes with multiple projects that thoroughly test your skills, learning, and practical knowledge, making you completely industry-ready.

You will work on highly exciting projects in the domains of high technology, ecommerce, marketing, sales, networking, banking, insurance, etc. After completing the projects successfully, your skills will be equal to 6 months of rigorous industry experience.

Intellipaat actively provides placement assistance to all learners who have successfully completed the training. For this, we are exclusively tied-up with over 80 top MNCs from around the world. This way, you can be placed in outstanding organizations such as Sony, Ericsson, TCS, Mu Sigma, Standard Chartered, Cognizant, and Cisco, among other equally great enterprises. We also help you with the job interview and résumé preparation as well.

You can definitely make the switch from self-paced training to online instructor-led training by simply paying the extra amount. You can join the very next batch, which will be duly notified to you.

Once you complete Intellipaat’s training program, working on real-world projects, quizzes, and assignments and scoring at least 60 percent marks in the qualifying exam, you will be awarded Intellipaat’s course completion certificate. This certificate is very well recognized in Intellipaat-affiliated organizations, including over 80 top MNCs from around the world and some of the Fortune 500companies.

Apparently, no. Our job assistance program is aimed at helping you land in your dream job. It offers a potential opportunity for you to explore various competitive openings in the corporate world and find a well-paid job, matching your profile. The final decision on hiring will always be based on your performance in the interview and the requirements of the recruiter.

View More

Talk To Us

Recommended Courses

Select Currency