Browse

Data Science Architect Master’s Program

Master Program

Our Data Science Architect master's course lets you gain proficiency in Data Science. You will work on real-world projects in Data Science with R, Hadoop Dev, Admin, Test and Analysis, Apache Spark, Scala, Deep Learning, Tableau, Data Science with SAS, SQL, MongoDB and more. In this program, you will cover 10 courses and 53 industry-based projects with 1 CAPSTONE project. As a part of online classroom training, you will receive five additional self-paced courses co-created with IBM namely Deep Learning with TensorFlow, Build Chatbots with Watson Assistant, R for Data Science, Spark MLlIb, and Python for Data Science. Moreover, you will also get an exclusive access to IBM Watson Cloud Lab for Chatbots course.

In Collaboration with course image
  • 10+

    Courses

  • 53+

    Projects

  • 232

    Hours

What you will Learn 10 Courses

  • Online Classroom Training

    • Course 1
      Data Science With R
    • Course 2
      Python for Data Science
    • Course 3
      Machine Learning
    • Course 4
      AI & Deep Learning
    • Course 5
      Big Data Hadoop & Spark
    • Course 6
      Tableau Desktop 10
    • Course 7
      Data Science with SAS
  • Self Paced Training

    • Course 8
      Advanced Excel
    • Course 9
      MongoDB 
    • Course 10
      MS-SQL 
  • Get Master's Certificate

Key Features

232 Hrs Instructor Led Training
104 Hrs Self-paced Videos
253 Hrs Project work & Exercises
Certification and Job Assistance
Flexible Schedule
Lifetime Free Upgrade
24 x 7 Lifetime Support & Access

Course Fees

Self Paced Training

  • 104 Hrs e-learning videos
  • Lifetime Free Upgrade
  • 24 x 7 Lifetime Support & Access
  • Flexi-scheduling
$702

Online Classroom preferred

  • Everything in self-paced, plus
  • 232 Hrs of instructor-led training
  • 1:1 doubt resolution sessions
  • Attend as many batches for Lifetime
  • Flexible Schedule
  • 16 Aug
  • SAT - SUN
  • 08:00 PM TO 11:00 PM IST (GMT +5:30)
  • 22 Aug
  • SAT - SUN
  • 08:00 PM TO 11:00 PM IST (GMT +5:30)
  • 25 Aug
  • TUE - FRI
  • 07:00 AM TO 09:00 AM IST (GMT +5:30)
  • 30 Aug
  • SAT - SUN
  • 08:00 PM TO 11:00 PM IST (GMT +5:30)
$ 1499 $1,099 10% OFF Expires in
$0

Corporate Training

  • Customized Learning
  • Enterprise grade learning management system (LMS)
  • 24x7 support
  • Strong Reporting

Overview

Intellipaat Data Science Architect master’s course will provide you with in-depth knowledge on Data Science, real-time analytics, statistical computing, SQL, parsing machine-generated data and finally the domain of Deep Learning in Artificial Intelligence. In this program, you will also learn how to leverage Big Data Analytics with Spark for Data Science. This program is specially designed by industry experts, and you will get 10 courses with 53 industry-based projects.

List of Courses Included

Online Instructor-led Courses:

  • Data Science with R
  • Python for Data Science
  • Machine Learning
  • Artificial Intelligence and Deep Learning with TensorFlow
  • Big Data Hadoop & Spark
  • Tableau Desktop 10
  • Data Science with SAS

Self-paced Courses:

  • Advanced Excel
  • MongoDB
  • MS-SQL
  • MapReduce and HDFS
  • Real-time analytics with Spark
  • Data Scientist roles and responsibilities
  • Prediction and analysis through clustering
  • Deploying the recommender system
  • SAS advanced analytics and R programming
  • Linear and logistic regression
  • Making sense of NoSQL data
  • Deep Learning model in AI
  • Data Scientists, Machine Learning Professionals and Software Developers
  • Business Intelligence Professionals, Information Architects and Project Managers
  • Those looking to be a Data Science Architect

There are no prerequisites for taking up this training program.

  • Data Scientist is the best job of the 21st century – Harvard Business Review
  • Global Big Data market to reach $122 billion in revenue by 2025 – Frost & Sullivan

This Intellipaat training program has been created keeping in mind the needs of the industry when it comes to the domain of Data Science. Today’s Data Scientists need to have a diverse set of skills which include working with huge volumes of data, parsing that data and converting them into a format that is easily understandable, using which business insights can be derived. This training program lets you play multiple roles in the Big Data and Data Science domains and get hired for top-notch salaries.

View More

Talk To Us

Testimonials

John Chioles

Ritesh Bhagwat

Mr Yoga

Dileep & Ajay

Sagar

Ashok Guntupalli

Bhanukumar Muppalla

Software Engineer at DXC Technology

The Data Science training includes a lot of constituent components, and the Intellipaat Data Science training provided the most comprehensive and in-depth learning experience. I really liked the projects in Data Science, which were real-world projects, that helped me take on a Data Science role in the real world much easier.

Shreyash Limbhetwala

Technical Delivery Lead

I want to talk about the rich LMS that Intellipaat Data Science training offered. The extensive set of PPTs, PDFs and other related course material were of highest quality, and due to this my learning with Intellipaat was excellent. I could also clear the Cloudera Data Scientist certification in the first attempt.

Anthony Crenshaw

Master Radio Electronic Communication Officer

I am glad that I took the Intellipaat Spark training. The trainers offered quality Spark training with real-world examples, and there was extensive interactivity throughout the training which made the Intellipaat training the best according to me.

Swetha Pandit

Big Data Developer at Accenture

Their courses are well structured and taught by recognized professionals. I have found the videos to be of excellent quality. Thanks!

Vaishnavi Vyas

Transportation Specialist at Amazon

Hello All, The course that I took was Data Science with R. I had a very positive experience on the forum. They have good tutors and a remarkable service in terms of clearing the doubts and issues related to the course. To sum it up, it was a good learning experience.

Satya

Sr. Manager at Cognizant Technology Solutions

Overall The quality of the material , faculty and the way the courses are designed conducted and managed are outstanding. Intellipaat follow a very disciplined ways to conduct the classes.Operation team sends timely reminder. Help and support is good.

Koushik

Student

Instructors have good grip on the subject.The support team was available whenever I needed them.Value for money.

Indira Vemuri

Data Scientist

The course material is of good quality and curated by industry experts. I got my classes scheduled as per my request. I am completely satisfied with the quality training. Overall, I had very good experience while taking up the training.

Adegboyega During

Keystone Bank Limited

The classes on each module were conducted step-by-step. All my queries were addressed very clearly. I suggest this is the best course for beginners. The Data Science projects were really worth working on.

Suman Galla

Infrastructure Stability Engineer, Invesco

I loved the way the trainer took the classes systematically. The content was great. He made all tough concepts easy for me to understand. The course was very much beneficial for me.

Pradnya Phutane

Salesforce Sr. Consultant & Administrator

I am glad that I got an opportunity to get trained from Intellipaat. The entire course was highly contented. The trainers were very patient to answers my questions. Overall, it is the best course ever.

Vivek Daga

Lead Business Analyst

The training schedule is very accurate, classes will end on time and trainer will have much time for clearing doubts. I am satisfied with the videos and content are given. The instructor is very intellectual and addresses every question.

Course Content

Module 01 - Introduction to Data Science with R

1.1 What is Data Science?
1.2 Significance of Data Science in today’s data-driven world, applications of Data Science, lifecycle of Data Science, and its components
1.3 Introduction to Big Data Hadoop, Machine Learning, and Deep Learning
1.4 Introduction to R programming and RStudio

Hands-on Exercise:

1. Installation of RStudio
2. Implementing simple mathematical operations and logic using R operators, loops, if statements, and switch cases

Module 02 - Data Exploration

2.1 Introduction to data exploration
2.2 Importing and exporting data to/from external sources
2.3 What are data exploratory analysis and data importing?
2.4 DataFrames, working with them, accessing individual elements, vectors, factors, operators, in-built functions, conditional and looping statements, user-defined functions, and data types

Hands-on Exercise:

1. Accessing individual elements of customer churn data
2. Modifying and extracting results from the dataset using user-defined functions in R

Module 03 - Data Manipulation

3.1 Need for data manipulation
3.2 Introduction to the dplyr package
3.3 Selecting one or more columns with select(), filtering records on the basis of a condition with filter(), adding new columns with mutate(), sampling, and counting
3.4 Combining different functions with the pipe operator and implementing SQL-like operations with sqldf

Hands-on Exercise:

1. Implementing dplyr
2. Performing various operations for manipulating data and storing it

Module 04 - Data Visualization

4.1 Introduction to visualization
4.2 Different types of graphs, the grammar of graphics, the ggplot2 package, categorical distribution with geom_bar(), numerical distribution with geom_hist(), building frequency polygons with geom_freqpoly(), and making a scatterplot with geom_pont()
4.3 Multivariate analysis with geom_boxplot
4.4 Univariate analysis with a barplot, a histogram and a density plot, and multivariate distribution
4.5 Creating barplots for categorical variables using geom_bar(), and adding themes with the theme() layer
4.6 Visualization with plotly, frequency plots with geom_freqpoly(), multivariate distribution with scatter plots and smooth lines, continuous distribution vs categorical distribution with box-plots, and sub grouping plots
4.7 Working with co-ordinates and themes to make graphs more presentable, understanding plotly and various plots, and visualization with ggvis
4.8 Geographic visualization with ggmap() and building web applications with shinyR

Hands-on Exercise:

1. Creating data visualization to understand the customer churn ratio using ggplot2 charts
2. Using plotly for importing and analyzing data
3. Visualizing tenure, monthly charges, total charges, and other individual columns using a scatter plot

Module 05 - Introduction to Statistics

5.1 Why do we need statistics?
5.2 Categories of statistics, statistical terminology, types of data, measures of central tendency, and measures of spread
5.3 Correlation and covariance, standardization and normalization, probability and the types, hypothesis testing, chi-square testing, ANOVA, normal distribution, and binary distribution

Hands-on Exercise:

1. Building a statistical analysis model that uses quantification, representations, and experimental data
2. Reviewing, analyzing, and drawing conclusions from the data

Module 06 - Machine Learning

6.1 Introduction to Machine Learning
6.2 Introduction to linear regression, predictive modeling, simple linear regression vs multiple linear regression, concepts, formulas, assumptions, and residuals in Linear Regression, and building a simple linear model
6.3 Predicting results and finding the p-value and an introduction to logistic regression
6.4 Comparing linear regression with logistics regression and bivariate logistic regression with multivariate logistic regression
6.5 Confusion matrix the accuracy of a model, understanding the fit of the model, threshold evaluation with ROCR, and using qqnorm() and qqline()
6.6 Understanding the summary results with null hypothesis, F-statistic, and
building linear models with multiple independent variables

Hands-on Exercise:

1. Modeling the relationship within data using linear predictor functions
2. Implementing linear and logistics regression in R by building a model with ‘tenure’ as the dependent variable

Module 07 - Logistic Regression

7.1 Introduction to logistic regression
7.2 Logistic regression concepts, linear vs logistic regression, and math behind logistic regression
7.3 Detailed formulas, logit function and odds, bivariate logistic regression, and Poisson regression
7.4 Building a simple binomial model and predicting the result, making a confusion matrix for evaluating the accuracy, true positive rate, false positive rate, and threshold evaluation with ROCR
7.5 Finding out the right threshold by building the ROC plot, cross validation, multivariate logistic regression, and building logistic models with multiple independent variables
7.6 Real-life applications of logistic regression

Hands-on Exercise:

1. Implementing predictive analytics by describing data
2. Explaining the relationship between one dependent binary variable and one or more binary variables
3. Using glm() to build a model, with ‘Churn’ as the dependent variable

Module 08 - Decision Trees and Random Forest

8.1 What is classification? Different classification techniques
8.2 Introduction to decision trees
8.3 Algorithm for decision tree induction and building a decision tree in R
8.4 Confusion matrix and regression trees vs classification trees
8.5 Introduction to bagging
8.6 Random forest and implementing it in R
8.7 What is Naive Bayes? Computing probabilities
8.8 Understanding the concepts of Impurity function, Entropy, Gini index, and Information gain for the right split of node
8.9 Overfitting, pruning, pre-pruning, post-pruning, and cost-complexity pruning, pruning a decision tree and predicting values, finding out the right number of trees, and evaluating performance metrics

Hands-on Exercise:

1. Implementing random forest for both regression and classification problems
2. Building a tree, pruning it using ‘churn’ as the dependent variable, and building a random forest with the right number of trees
3. Using ROCR for performance metrics

Module 09 - Unsupervised Learning

9.1 What is Clustering? Its use cases
9.2 what is k-means clustering? What is canopy clustering?
9.3 What is hierarchical clustering?
9.4 Introduction to unsupervised learning
9.5 Feature extraction, clustering algorithms, and the k-means clustering algorithm
9.6 Theoretical aspects of k-means, k-means process flow, k-means in R, implementing k-means, and finding out the right number of clusters using a scree plot
9.7 Dendograms, understanding hierarchical clustering, and implementing it in R
9.8 Explanation of Principal Component Analysis (PCA) in detail and implementing PCA in R

Hands-on Exercise:

1. Deploying unsupervised learning with R to achieve clustering and dimensionality reduction
2. K-means clustering for visualizing and interpreting results for the customer churn data

Module 10 - Association Rule Mining and Recommendation Engines

10.1 Introduction to association rule mining and MBA
10.2 Measures of association rule mining: Support, confidence, lift, and apriori algorithm, and implementing them in R
10.3 Introduction to recommendation engines
10.4 User-based collaborative filtering and item-based collaborative filtering, and implementing a recommendation engine in R
10.5 Recommendation engine use cases

Hands-on Exercise:

1. Deploying association analysis as a rule-based Machine Learning method
2. Identifying strong rules discovered in databases with measures based on interesting discoveries

Self-paced Course Content

Module 11 - Introduction to Artificial Intelligence

11.1 Introducing Artificial Intelligence and Deep Learning
11.2 What is an artificial neural network? TensorFlow: The computational framework for building AI models
11.3 Fundamentals of building ANN using TensorFlow and working with TensorFlow in R

Module 12 - Time Series Analysis

12.1 What is a time series? The techniques, applications, and components of time series
12.2 Moving average, smoothing techniques, and exponential smoothing
12.3 Univariate time series models and multivariate time series analysis
12.4 ARIMA model
12.5 Time series in R, sentiment analysis in R (Twitter sentiment analysis), and text analysis

Hands-on Exercise:

1. Analyzing time series data
2. Analyzing the sequence of measurements that follow a non-random order to identify the nature of phenomenon and forecast the future values in the series

Module 13 - Support Vector Machine (SVM)

13.1 Introduction to Support Vector Machine (SVM)
13.2 Data classification using SVM
13.3 SVM algorithms using separable and inseparable cases
13.4 Linear SVM for identifying margin hyperplane

Module 14 - Naïve Bayes

14.1 What is the Bayes theorem?
14.2 What is Naïve Bayes Classifier?
14.3 Classification Workflow
14.4 How Naive Bayes classifier works and classifier building in Scikit-Learn
14.5 Building a probabilistic classification model using Naïve Bayes and the zero probability problem

Module 15 - Text Mining

15.1 Introduction to the concepts of text mining
15.2 Text mining use cases and understanding and manipulating the text with ‘tm’ and ‘stringR’
15.3 Text mining algorithms and the quantification of the text
15.4 TF-IDF and after TF-IDF

Case Studies

Case Study 01: Market Basket Analysis (MBA)

1.1 This case study is associated with the modeling technique of Market Basket Analysis, where you will learn about loading data, plotting items, and running algorithms.
1.2 It includes finding out the items that go hand in hand and can be clubbed together.
1.3 This is used for various real-world scenarios like a supermarket shopping cart and so on.

Case Study 02: Logistic Regression

2.1 In this case study, you will get a detailed understanding of the advertisement spends of a company that will help drive more sales.
2.2 You will deploy logistic regression to forecast future trends.
2.3 You will detect patterns and uncover insight using the power of R programming.
2.4 Due to this, the future advertisement spends can be decided and optimized for higher revenues.

Case Study 03: Multiple Regression

3.1 You will understand how to compare the miles per gallon (MPG) of a car based on various parameters.
3.2 You will deploy multiple regression and note down the MPG for car make, model, speed, load conditions, etc.
3.3 The case study includes model building, model diagnostic, and checking the ROC curve, among other things.

Case Study 04: Receiver Operating Characteristic (ROC)

4.1 In this case study, you will work with various datasets in R.
4.2 You will deploy data exploration methodologies.
4.3 You will also build scalable models.
4.4 Besides, you will predict the outcome with highest precision, diagnose the model that you have created with real-world data, and check the ROC curve.

What projects will I be working on in this Data Science certification course?

Project 01: Market Basket Analysis

Domain: Inventory Management

Problem Statement: As a new manager in the company, you are assigned the task of increasing cross selling

Topics: Association rule mining, data extraction, and data manipulation

Highlights:

  • Performing association rule mining
  • Understanding where to implement the apriori algorithm
  • Setting association rules with respect to confidence

Project 02: Credit Card Fraud Detection

Domain: Banking

Problem Statement: Analyze the probability of being involved in a fraudulent operation

Topics: Algorithms, V17 predictor, data visualization, and R

Highlights:

  • Working with the credit card dataset
  • Performing data analysis on various labels in the data
  • Making use of V17 as predictor and using V14 for analysis
  • Plotting score performance with respect to variables

Project 03: Data Cleaning Using the Census Dataset

Domain: Government

Problem Statement: Perform data cleansing on the raw dataset

Topics: Data analysis, data preprocessing, cleaning ops, data visualization, and R

Highlights:

  • Working with the census dataset
  • Changing a label to perform analysis
  • Creation of functions to eliminate values that are not required
  • Verifying the completion of data cleansing

Project 04: Loan Approval Prediction

Domain: Banking

Problem Statement: Predict the approval rate of a loan by using multiple labels

Topics: Data analysis, data preprocessing, cleaning ops, data visualization, and R

Highlights:

  • Performing data preprocessing
  • Building a model and applying PCA
  • Building a Naïve Bayes model on the training dataset
  • Prediction of values after performing analysis

Project 05: Designing a Book Recommendation System

Domain: Ecommerce

Problem Statement: Create a model, which can recommend books, based on user interest

Topics: Data cleaning, data visualization, and user-based collaborative filtering

Highlights:

  • Finding the most popular books using various techniques
  • Creating a book recommender model using user-based collaborative filtering

Project 06: Netflix Recommendation System

Domain: Ecommerce

Problem Statement: Simulate the Netflix recommendation system

Topics: Data cleaning, data visualization, distribution, and Recommender Lab

Highlights:

  • Working with raw data
  • Using the Recommender Lab library in R
  • Making use of real data from Netflix

Project 07: Creating a Pokemon Game Using Machine Learning

Domain: Gaming

Problem Statement: Create a game engine for Pokemon using Machine Learning

Topics: Decision trees, regression, data cleaning, and data visualization

Highlights:

  • Predicting which Pokemon will win based on ‘Attack vs Defense’
  • Finding whether a Pokemon is legendary using decision trees
  • Understanding the dynamics of decision-making in Machine Learning

Case Study 01: Introduction to R Programming

Problem Statement: Working with various operators in R

Topics: Arithmetic operators, relational operators, and logical operators

Highlights:

  • Working with arithmetic operators
  • Working with relational operators
  • Working with logical operators

Case Study 02: Solving Customer Churn Using Data Exploration

Problem Statement: Understanding what to do to reduce customer churn using data exploration

Topics: Data Exploration

Highlights:

  • Extracting individual columns
  • Creating and applying filters to manipulate data
  • Using loops for redundant operations

Case Study 03: Creating Data Structures in R

Problem Statement: Implementing various data structures in R for various scenarios

Topics: Vectors, lists, matrices, and arrays

Highlights:

  • Creating and implementing vectors
  • Understanding lists
  • Using arrays to store matrices
  • Creating and implementing matrices

Case Study 04: Implementing SVD in R

Problem Statement: Understanding the use of single value decomposition in R by making use of the MovieLense dataset

Topics: 5-fold cross validation and realRatingMatrix

Highlights:

  • Creating custom recommended movie sets for each user
  • Creating a user-based collaborative filtering model
  • Creating realRatingMatrix for movie recommendation

Case Study 05: Time Series Analysis

Problem Statement: Performing TSA and understanding the concepts of ARIMA for a given scenario

Topics: Time series analysis, R language, data visualization, and the ARIMA model

Highlights:

  • Understanding how to fit an ARIMA model
  • Plotting PACF charts and finding optimal parameters
  • Building the ARIMA model
  • Prediction of values after performing analysis

Module 01 - Python Environment Setup and Essentials

1.1 Introduction to Python Language
1.2 Features, the advantages of Python over other programming languages
1.3 Python installation – Windows, Mac & Linux distribution for Anaconda Python
1.4 Deploying Python IDE
1.5 Basic Python commands, data types, variables, keywords and more

Hands-on Exercise – Installing Python Anaconda for the Windows, Linux and Mac.

Module 02 - Python language Basic Constructs

2.1 Built-in data types in Python
2.2 Learn  classes, modules, Str(String), Ellipsis Object, Null Object, Ellipsis, Debug
2.3 Basic operators, comparison, arithmetic, slicing and slice operator, logical, bitwise
2.4 Loop and control statements while, for, if, break, else, continue.

Hands-on Exercise –
1. Write your first Python program
2. Write a Python Function (with and without parameters)
3. Use Lambda expression
4. Write a class
5. Create a member function and a variable
6. create an object
7. Write a for loop

Module 03 - OOP concepts in Python

3.1 How to write OOP concepts program in Python
3.2 Connecting to a database
3.3 Classes and objects in Python
3.4 OOPs paradigm, important concepts in OOP like polymorphism, inheritance, encapsulation
3.5 Python functions, return types and parameters
3.6 Lambda expressions

Hands-on Exercise –
1. Creating an application which helps to check balance, deposit money and withdraw the money using the concepts of OOPS.

Module 04 - Database connection

4.1 Understanding the Database, need of database
4.2 Installing MySQL on windows
4.3 Understanding Database connection using Python.

Hands-on Exercise – Demo on Database Connection using python and pulling the data.

Module 05 - NumPy for mathematical computing

5.1 Introduction to arrays and matrices
5.2 Broadcasting of array math, indexing of array
5.3 Standard deviation, conditional probability, correlation and covariance.

Hands-on Exercise –
1. How to import NumPy module
2. Creating array using ND-array
3. Calculating standard deviation on array of numbers
4. Calculating correlation between two variables.

Module 06 - SciPy for scientific computing

6.1 Introduction to SciPy
6.2 Functions building on top of NumPy, cluster, linalg, signal, optimize, integrate, subpackages, SciPy with Bayes Theorem.

Hands-on Exercise –
1. Importing of SciPy
2. Applying the Bayes theorem on the given dataset.

Module 07 - Matplotlib for data visualization

7.1 How to plot graph and chart with Python
7.2 Various aspects of line, scatter, bar, histogram, 3D, the API of MatPlotLib, subplots.

Hands-on Exercise –
1. Deploying MatPlotLib for creating Pie, Scatter, Line, Histogram.

Module 08 - Pandas for data analysis and machine learning

8.1 Introduction to Python dataframes
8.2 Importing data from JSON, CSV, Excel, SQL database, NumPy array to dataframe
8.3 Various data operations like selecting, filtering, sorting, viewing, joining, combining

Hands-on Exercise –
1. Working on importing data from JSON files
2. Selecting record by a group
3. Applying filter on top, viewing records

Module 09 - Exception Handling

9.1 Introduction to Exception Handling
9.2 Scenarios in Exception Handling with its execution
9.3 Arithmetic exception
9.4 RAISE of Exception
9.5 What is Random List, running a Random list on Jupyter Notebook
9.6 Value Error in Exception Handling.

Hands-on Exercise –
1. Demo on Exception Handling with an Industry-based Use Case.

Module 10 - Multi Threading & Race Condition

10.1 Introduction to Thread, need of threads
10.2 What are thread functions
10.3 Performing various operations on thread like joining a thread, starting a thread, enumeration in a thread
10.4 Creating a Multithread, finishing the multithreads.
10.5 Understanding Race Condition, lock and Synchronization.

Hands-on Exercise –
1. Demo on Starting a Thread and a Multithread and then perform multiple operations on them.

Module 11 - Packages and Functions

11.1 Intro to modules in Python, need of modules
11.2 How to import modules in python
11.3 Locating a module, namespace and scoping
11.4 Arithmetic operations on Modules using a function
11.5 Intro to Search path, Global and local functions, filter functions
11.6 Python Packages, import in packages, various ways of accessing the packages
11.7 Decorators, Pointer assignments, and Xldr.

Hands-on Exercise –
1. Demo on Importing the modules and performing various operation on them using arithmetic functions
2. Importing various packages and accessing them and then performing different operations on them.

Module 12 - Web scraping with Python

12.1 Introduction to web scraping in Python
12.2 Installing of beautifulsoup
12.3 Installing Python parser lxml
12.4 Various web scraping libraries, beautifulsoup, Scrapy Python packages
12.5 Creating soup object with input HTML
12.6 Searching of tree, full or partial parsing, output print

Hands-on Exercise –
1. Installation of Beautiful soup and lxml Python parser
2. Making a soup object with input HTML file
3. Navigating using Py objects in soup tree.

What projects I will be working in this Python certification course?

Project 01 : Analyzing the Naming Pattern Using Python

Industry : General

Problem Statement : How to analyze the trends and the most popular baby names

Topics : In this Python project, you will work with the United States Social Security Administration (SSA) which has made data on the frequency of baby names from 1880 to 2016 available. The project requires analyzing the data considering different methods. You will visualize the most frequent names, determine the naming trends and come up with the most popular names for a certain year.

Highlights :

  • Analyzing data using Pandas Library
  • Deploying Data Frame Manipulation
  • Bar and box plots with Matplotlib

Project 02 : – Python Web Scraping for Data Science

In this project, you will be introduced to the process of web scraping using Python. It involves installation of Beautiful Soup, web scraping libraries, working on common data and page format on the web, learning the important kinds of objects, Navigable String, deploying the searching tree, navigation options, parser, search tree, searching by CSS class, list, function and keyword argument.

Project 03 : Predicting Customer Churn in Telecom Company

Industry – Telecommunications

Problem Statement – How to increase the profitability of a telecom major by reducing the churn rate

Topics :In this project, you will work with the telecom company’s customer dataset. This dataset includes subscribing telephone customer’s details. Each of the column has data on phone number, call minutes during various times of the day, the charges incurred, lifetime account duration and whether the customer has churned some services by unsubscribing it. The goal is to predict whether a customer will eventually churn or not.

Highlights :

  • Deploy Scikit-Learn ML library
  • Develop code with Jupyter Notebook
  • Build a model using performance matrix

Module 01 - Introduction to Machine Learning

1.1 Need of Machine Learning
1.2 Introduction to Machine Learning
1.3 Types of Machine Learning, such as supervised, unsupervised, and reinforcement learning, Machine Learning with Python, and the applications of Machine Learning

Module 02 - Supervised Learning and Linear Regression

2.1 Introduction to supervised learning and the types of supervised learning, such as regression and classification
2.2 Introduction to regression
2.3 Simple linear regression
2.4 Multiple linear regression and assumptions in linear regression
2.5 Math behind linear regression

Hands-on Exercise:

1. Implementing linear regression from scratch with Python
2. Using Python library Scikit-Learn to perform simple linear regression and multiple linear regression
3. Implementing train–test split and predicting the values on the test set

Module 03 - Classification and Logistic Regression

3.1 Introduction to classification
3.2 Linear regression vs logistic regression
3.3 Math behind logistic regression, detailed formulas, the logit function and odds, confusion matrix and accuracy, true positive rate, false positive rate, and threshold evaluation with ROCR

Hands-on Exercise:

1. Implementing logistic regression from scratch with Python
2. Using Python library Scikit-Learn to perform simple logistic regression and multiple logistic regression
3. Building a confusion matrix to find out accuracy, true positive rate, and false positive rate

Module 04 - Decision Tree and Random Forest

4.1 Introduction to tree-based classification
4.2 Understanding a decision tree, impurity function, entropy, and understanding the concept of information gain for the right split of node
4.3 Understanding the concepts of information gain, impurity function, Gini index, overfitting, pruning, pre-pruning, post-pruning, and cost-complexity pruning
4.4 Introduction to ensemble techniques, bagging, and random forests and finding out the right number of trees required in a random forest

Hands-on Exercise:

1. Implementing a decision tree from scratch in Python
2. Using Python library Scikit-Learn to build a decision tree and a random forest
3. Visualizing the tree and changing the hyper-parameters in the random forest

Module 05 - Naïve Bayes and Support Vector Machine (self-paced)

5.1 Introduction to probabilistic classifiers
5.2 Understanding Naïve Bayes and math behind the Bayes theorem
5.3 Understanding a support vector machine (SVM)
5.4 Kernel functions in SVM and math behind SVM

Hands-on Exercise:

1. Using Python library Scikit-Learn to build a Naïve Bayes classifier and a support vector classifier

Module 06 - Unsupervised Learning

6.1 Types of unsupervised learning, such as clustering and dimensionality reduction, and the types of clustering
6.2 Introduction to k-means clustering
6.3 Math behind k-means
6.4 Dimensionality reduction with PCA

Hands-on Exercise:

1. Using Python library Scikit-Learn to implement k-means clustering
2. Implementing PCA (principal component analysis) on top of a dataset

Module 07 - Natural Language Processing and Text Mining (self-paced)

7.1 Introduction to Natural Language Processing (NLP)
7.2 Introduction to text mining
7.3 Importance and applications of text mining
7.4 How NPL works with text mining
7.5 Writing and reading to word files
7.6 Language Toolkit (NLTK) environment
7.7 Text mining: Its cleaning, pre-processing, and text classification

Hands-on Exercise:

1. Learning Natural Language Toolkit and NLTK Corpora
2. Reading and writing .txt files from/to a local drive
3. Reading and writing .docx files from/to a local drive

Module 08 - Introduction to Deep Learning

8.1 Introduction to Deep Learning with neural networks
8.2 Biological neural networks vs artificial neural networks
8.3 Understanding perception learning algorithm, introduction to Deep Learning frameworks, and TensorFlow constants, variables, and place-holders

Module 09 - Time Series Analysis (self-paced)

9.1 What is time series? Its techniques and applications
9.2 Time series components
9.3 Moving average, smoothing techniques, and exponential smoothing
9.4 Univariate time series models
9.5 Multivariate time series analysis
9.6 ARIMA model and time series in Python
9.7 Sentiment analysis in Python (Twitter sentiment analysis) and text analysis

Hands-on Exercise:

1. Analyzing time series data
2. The sequence of measurements that follow a non-random order to recognize the nature of the phenomenon
3. Forecasting the future values in the series

What projects and case studies will I be working on in this Machine Learning course Online?

Project 01: Analyzing the Trends of COVID-19 with Python

Industry: Analytics

Problem Statement: Understanding the trends of COVID-19 spread and checking if restrictions imposed by governments around the world have helped us curb COVID-19 cases and by what degree

Topics: In this project, we will use Data Science and Python and perform visualizations to better understand the data on COVID-19. We will also use time series analysis to make predictions about future cases.

Highlights:

  • Using Pandas to accumulate data from multiple data files
  • Using plotly to create interactive visualizations
  • Using Facebook’s Prophet library to make time series models
  • Visualizing the prediction by combining these technologies

Project 02: Customer Churn Classification

Topics: This is a real-world project that gives you hands-on experience in working with most of the ML algorithms.

Highlights:

  • Manipulating data to gain meaningful insights
  • Visualizing data to figure out trends and patterns among different factors
  • Implementing these algorithms: linear regression, decision tree, and Naïve Bayes

Project 03: Creating a Recommendation System for Movies

Topics: This is a real-world project that gives you hands-on experience in working with a movie recommender system. Depending on what movies are liked by a particular user, you will be in a position to provide data-driven recommendations. This project requires you to deeply understand information filtering, recommender systems, user ‘preference’, and more. You will exclusively work on data related to user details, movie details, and others.

Highlights:

  • Recommendation for movies
  • Two types of predictions: Rating prediction and item prediction
  • Important approaches: Memory-based and model-based
  • Knowing user-based methods in k-nearest neighbor
  • Understanding the item-based method
  • Matrix factorization
  • Decomposition of singular value
  • Data Science project discussion
  • Collaboration filtering
  • Business variables overview

Case Study 01: Decision Tree

Topics: Understand the structure of a dataset (PIMA Indians Diabetes database) and create a decision tree model based on it by using Scikit-Learn

Case Study 02: Insurance Cost Prediction (Linear Regression)

Topics: Understand the structure of a medical insurance dataset, implement both simple and multiple linear regressions, and predict values.

Case Study 03: Diabetes Classification (Logistic Regression)

Topics: Understand the structure of a dataset (PIMA Indians Diabetes dataset); implement multiple logistic regressions and classify; fit your model on the test and train data for prediction; evaluate your model using confusion matrix, and then visualize it

Case Study 04: Random Forest

Topics: Create a model that would help in classifying whether a patient ‘is normal,’ ‘is suspected to have a disease,’ or in actuality ‘has the disease’ using the ‘Cardiotocography’ dataset

Case Study 05: Principal Component Analysis (PCA)

Topics: Read the sample Iris dataset; use PCA to figure out the number of most important principal features, and then reduce the number of features using PCA; train and test the random forest classifier algorithm to check if reducing the number of dimensions is causing the model to perform poorly, and figure out the most optimal number that produces good quality results and predicts accurately

Case Study 06: K-means Clustering

Topics: Analyze data; extract useful columns from the dataset; visualize the data; find out the appropriate number of groups or clusters for the data to be segmented (using the elbow method); using k-means clustering, segment the data into k groups (k is found in the previous step); visualize a scatter plot of clusters, and a lot more

Module 01 - Introduction to Deep Learning and Neural Networks

1.1 Field of machine learning, its impact on the field of artificial intelligence
1.2 The benefits of machine learning w.r.t. Traditional methodologies
1.3 Deep learning introduction and how it is different from all other machine learning methods
1.4 Classification and regression in supervised learning
1.5 Clustering and association in unsupervised learning, algorithms that are used in these categories
1.6 Introduction to ai and neural networks
1.7 Machine learning concepts
1.8 Supervised learning with neural networks
1.9 Fundamentals of statistics, hypothesis testing, probability distributions, and hidden markov models.

Module 02 - Multi-layered Neural Networks

2.1 Multi-layer network introduction, regularization, deep neural networks
2.2 Multi-layer perceptron
2.3 Overfitting and capacity
2.4 Neural network hyperparameters, logic gates
2.5 Different activation functions used in neural networks, including relu, softmax, sigmoid and hyperbolic functions
2.6 Back propagation, forward propagation, convergence, hyperparameters, and overfitting.

Module 03 - Artificial Neural Networks and Various Methods

3.1 Various methods that are used to train artificial neural networks
3.2 Perceptron learning rule, gradient descent rule, tuning the learning rate, regularization techniques, optimization techniques
3.3 Stochastic process, vanishing gradients, transfer learning, regression techniques,
3.4 Lasso l1 and ridge l2, unsupervised pre-training, xavier initialization.

Module 04 - Deep Learning Libraries

4.1 Understanding how deep learning works
4.2 Activation functions, illustrating perceptron, perceptron training
4.3 multi-layer perceptron, key parameters of perceptron;
4.4 Tensorflow introduction and its open-source software library that is used to design, create and train
4.5 Deep learning models followed by google’s tensor processing unit (tpu) programmable ai
4.6 Python libraries in tensorflow, code basics, variables, constants, placeholders
4.7 Graph visualization, use-case implementation, keras, and more.

Module 05 - Keras API

5.1 Keras high-level neural network for working on top of tensorflow
5.2 Defining complex multi-output models
5.3 Composing models using keras
5.3 Sequential and functional composition, batch normalization
5.4 Deploying keras with tensorboard, and neural network training process customization.

Module 06 - TFLearn API for TensorFlow

6.1 Using tflearn api to implement neural networks
6.2 Defining and composing models, and deploying tensorboard

Module 07 - Dnns (deep neural networks)

7.1 Mapping the human mind with deep neural networks (dnns)
7.2 Several building blocks of artificial neural networks (anns)
7.3 The architecture of dnn and its building blocks
7.4 Reinforcement learning in dnn concepts, various parameters, layers, and optimization algorithms in dnn, and activation functions.

Module 08 - Cnns (convolutional neural networks)

8.1 What is a convolutional neural network?
8.2 Understanding the architecture and use-cases of cnn
8.3‘What is a pooling layer?’ how to visualize using cnn
8.4 How to fine-tune a convolutional neural network
8.5 What is transfer learning?
8.6 Understanding recurrent neural networks, kernel filter, feature maps, and pooling, and deploying convolutional neural networks in tensorflow.

Module 09 - Rnns (recurrent neural networks)

9.1 Introduction to the rnn model
9.2 Use cases of rnn, modeling sequences
9.3 Rnns with back propagation
9.4 Long short-term memory (lstm)
9.5 Recursive neural tensor network theory, the basic rnn cell, unfolded rnn,  dynamic rnn
9.6 Time-series predictions.

Module 10 - Gpu in deep learning

10.1 Gpu’s introduction, ‘how are they different from cpus?,’ the significance of gpus
10.2 Deep learning networks, forward pass and backward pass training techniques
10.3 Gpu constituent with simpler core and concurrent hardware.

Module 11- Autoencoders and restricted boltzmann machine (rbm)

11.1 Introduction  rbm and autoencoders
11.2 Deploying rbm for deep neural networks, using rbm for collaborative filtering
11.3 Autoencoders features and applications of autoencoders.

Module 12 - Deep learning applications

12.1 Image processing
12.2 Natural language processing (nlp) – Speech recognition, and video analytics.

Module 13 - Chatbots

13.1 Automated conversation bots leveraging any of the following descriptive techniques:  Ibm watson, Microsoft’s luis, Open–closed domain bots,
13.2 Generative model, and the sequence to sequence model (lstm).

What projects I will be working on during this AI online course?

Project 01: Image Recognition with TensorFlow

Industry: Internet Search

Problem Statement: Creating a Deep Learning model to identify the right object on the Internet as per the user search for the corresponding image

Description: In this project, you will learn how to build a convolutional neural network using Google TensorFlow. You will do the visualization of images using training, providing input images, losses, and distributions of activations and gradients. You will learn to break each image into manageable tiles and input them to the convolutional neural network for the desired result.

Highlights:

  • Constructing a convolutional neural network using TensorFlow
  • Convolutional, dense, and pooling layers of CNNs
  • Filtering images based on user queries

Project 02: Building an AI-based Chatbot using IBM watson LAB

Industry: Ecommerce

Problem Statement: Building a chatbot using Artificial Intelligence

Description: In this project, by understanding the customer needs, you will be able to offer the right services through Artificial Intelligence chatbots. You will learn how to create the right artificial neural network with the right amount of layers to ensure that the customer queries are comprehensible to the Artificial Intelligence chatbot. This will help you understand Natural Language Processing, going beyond keywords, data parsing, and providing the right solutions.

Highlights:

  • Breaking user queries into components
  • Building neural networks with TensorFlow
  • Understanding Natural Language Processing

Project 03: Ecommerce Product Recommendation

Industry: Ecommerce

Problem Statement: Recommending the right products to customers using Artificial Intelligence with TensorFlow

Description: This project involves working with recommender systems to provide the right product recommendation to customers with TensorFlow. You will learn how to use Artificial Intelligence to check for users’ past buying habits, find out the products that go hand-in-hand, and recommend the best products that can be bought together with a particular product.

Highlights:

  • Building neural networks with TensorFlow
  • Looking at huge amounts of data and gaining insights
  • Building a recommendation engine with TensorFlow Graph

Module 01 - Hadoop Installation and Setup

1.1 The architecture of Hadoop cluster
1.2 What is High Availability and Federation?
1.3 How to setup a production cluster?
1.4 Various shell commands in Hadoop
1.5 Understanding configuration files in Hadoop
1.6 Installing a single node cluster with Cloudera Manager
1.7 Understanding Spark, Scala, Sqoop, Pig, and Flume

Module 02 - Introduction to Big Data Hadoop and Understanding HDFS and MapReduce

2.1 Introducing Big Data and Hadoop
2.2 What is Big Data and where does Hadoop fit in?
2.3 Two important Hadoop ecosystem components, namely, MapReduce and HDFS
2.4 In-depth Hadoop Distributed File System – Replications, Block Size, Secondary Name node, High Availability and in-depth YARN – resource manager and node manager

Hands-on Exercise:

1. HDFS working mechanism
2. Data replication process
3. How to determine the size of the block?
4. Understanding a data node and name node

Module 03 - Deep Dive in MapReduce

3.1 Learning the working mechanism of MapReduce
3.2 Understanding the mapping and reducing stages in MR
3.3 Various terminologies in MR like Input Format, Output Format, Partitioners, Combiners, Shuffle, and Sort

Hands-on Exercise:

1. How to write a WordCount program in MapReduce?
2. How to write a Custom Partitioner?
3. What is a MapReduce Combiner?
4. How to run a job in a local job runner
5. Deploying a unit test
6. What is a map side join and reduce side join?
7. What is a tool runner?
8. How to use counters, dataset joining with map side, and reduce side joins?

Module 04 - Introduction to Hive

4.1 Introducing Hadoop Hive
4.2 Detailed architecture of Hive
4.3 Comparing Hive with Pig and RDBMS
4.4 Working with Hive Query Language
4.5 Creation of a database, table, group by and other clauses
4.6 Various types of Hive tables, HCatalog
4.7 Storing the Hive Results, Hive partitioning, and Buckets

Hands-on Exercise:

1. Database creation in Hive
2. Dropping a database
3. Hive table creation
4. How to change the database?
5. Data loading
6. Dropping and altering table
7. Pulling data by writing Hive queries with filter conditions
8. Table partitioning in Hive
9. What is a group by clause?

Module 05 - Advanced Hive and Impala

5.1 Indexing in Hive
5.2 The ap Side Join in Hive
5.3 Working with complex data types
5.4 The Hive user-defined functions
5.5 Introduction to Impala
5.6 Comparing Hive with Impala
5.7 The detailed architecture of Impala

Hands-on Exercise: 

1. How to work with Hive queries?
2. The process of joining the table and writing indexes
3. External table and sequence table deployment
4. Data storage in a different table

Module 06 - Introduction to Pig

6.1 Apache Pig introduction and its various features
6.2 Various data types and schema in Hive
6.3 The available functions in Pig, Hive Bags, Tuples, and Fields

Hands-on Exercise: 

1. Working with Pig in MapReduce and local mode
2. Loading of data
3. Limiting data to 4 rows
4. Storing the data into files and working with Group By, Filter By, Distinct, Cross, Split in Hive

Module 07 - Flume, Sqoop and HBase

7.1 Apache Sqoop introduction
7.2 Importing and exporting data
7.3 Performance improvement with Sqoop
7.4 Sqoop limitations
7.5 Introduction to Flume and understanding the architecture of Flume
7.6 What is HBase and the CAP theorem?

Hands-on Exercise: 

1. Working with Flume to generate Sequence Number and consume it
2. Using the Flume Agent to consume the Twitter data
3. Using AVRO to create Hive Table
4. AVRO with Pig
5. Creating Table in HBase
6. Deploying Disable, Scan, and Enable Table

Module 08 - Writing Spark Applications Using Scala

8.1 Using Scala for writing Apache Spark applications
8.2 Detailed study of Scala
8.3 The need for Scala
8.4 The concept of object-oriented programming
8.5 Executing the Scala code
8.6 Various classes in Scala like getters, setters, constructors, abstract, extending objects, overriding methods
8.7 The Java and Scala interoperability
8.8 The concept of functional programming and anonymous functions
8.9 Bobsrockets package and comparing the mutable and immutable collections
8.10 Scala REPL, Lazy Values, Control Structures in Scala, Directed Acyclic Graph (DAG), first Spark application using SBT/Eclipse, Spark Web UI, Spark in Hadoop ecosystem.

Hands-on Exercise:

1. Writing Spark application using Scala
2. Understanding the robustness of Scala for Spark real-time analytics operation

Module 09 - Spark framework

9.1 Detailed Apache Spark and its various features
9.2 Comparing with Hadoop
9.3 Various Spark components
9.4 Combining HDFS with Spark and Scalding
9.5 Introduction to Scala
9.6 Importance of Scala and RDD

Hands-on Exercise: 

1. The Resilient Distributed Dataset (RDD) in Spark
2. How does it help to speed up Big Data processing?

Module 10 - RDD in Spark

10.1 Understanding the Spark RDD operations
10.2 Comparison of Spark with MapReduce
10.3 What is a Spark transformation?
10.4 Loading data in Spark
10.5 Types of RDD operations viz. transformation and action
10.6 What is a Key/Value pair?

Hands-on Exercise: 

1. How to deploy RDD with HDFS?
2. Using the in-memory dataset
3. Using file for RDD
4. How to define the base RDD from an external file?
5. Deploying RDD via transformation
6. Using the Map and Reduce functions
7. Working on word count and count log severity

Module 11 - Data Frames and Spark SQL

11.1 The detailed Spark SQL
11.2 The significance of SQL in Spark for working with structured data processing
11.3 Spark SQL JSON support
11.4 Working with XML data and parquet files
11.5 Creating Hive Context
11.6 Writing Data Frame to Hive
11.7 How to read a JDBC file?
11.8 Significance of a Spark data frame
11.9 How to create a data frame?
11.10 What is schema manual inferring?
11.11 Work with CSV files, JDBC table reading, data conversion from Data Frame to JDBC, Spark SQL user-defined functions, shared variable, and accumulators
11.12 How to query and transform data in Data Frames?
11.13 How data frame provides the benefits of both Spark RDD and Spark SQL?
11.14 Deploying Hive on Spark as the execution engine

Hands-on Exercise:

1. Data querying and transformation using Data Frames
2. Finding out the benefits of Data Frames over Spark SQL and Spark RDD

Module 12 - Machine Learning Using Spark (MLlib)

12.1 Introduction to Spark MLlib
12.2 Understanding various algorithms
12.3 What is Spark iterative algorithm?
12.4 Spark graph processing analysis
12.5 Introducing Machine Learning
12.6 K-Means clustering
12.7 Spark variables like shared and broadcast variables
12.8 What are accumulators?
12.9 Various ML algorithms supported by MLlib
12.10 Linear regression, logistic regression, decision tree, random forest, and K-means clustering techniques

Hands-on Exercise: 

1. Building a recommendation engine

Module 13 - Integrating Apache Flume and Apache Kafka

13.1 Why Kafka?
13.2 What is Kafka?
13.3 Kafka architecture
13.4 Kafka workflow
13.5 Configuring Kafka cluster
13.6 Basic operations
13.7 Kafka monitoring tools
13.8 Integrating Apache Flume and Apache Kafka

Hands-on Exercise:

1. Configuring Single Node Single Broker Cluster
2. Configuring Single Node Multi Broker Cluster
3. Producing and consuming messages
4. Integrating Apache Flume and Apache Kafka.

Module 14 - Spark Streaming

14.1 Introduction to Spark streaming
14.2 The architecture of Spark streaming
14.3 Working with the Spark streaming program
14.4 Processing data using Spark streaming
14.5 Requesting count and DStream
14.6 Multi-batch and sliding window operations
14.7 Working with advanced data sources
14.8 Features of Spark streaming
14.9 Spark Streaming workflow
14.10 Initializing StreamingContext
14.11 Discretized Streams (DStreams)
14.12 Input DStreams and Receivers
14.13 Transformations on DStreams
14.14 Output Operations on DStreams
14.15 Windowed operators and its uses
14.16 Important Windowed operators and Stateful operators

Hands-on Exercise:

1. Twitter Sentiment analysis
2. Streaming using Netcat server
3. Kafka-Spark streaming
4. Spark-Flume streaming

Module 15 - Hadoop Administration – Multi-node Cluster Setup Using Amazon EC2

15.1 Create a 4-node Hadoop cluster setup
15.2 Running the MapReduce Jobs on the Hadoop cluster
15.3 Successfully running the MapReduce code
15.4 Working with the Cloudera Manager setup

Hands-on Exercise:

1. The method to build a multi-node Hadoop cluster using an Amazon EC2 instance
2. Working with the Cloudera Manager

Module 16 - Hadoop Administration – Cluster Configuration

16.1 Overview of Hadoop configuration
16.2 The importance of Hadoop configuration file
16.3 The various parameters and values of configuration
16.4 The HDFS parameters and MapReduce parameters
16.5 Setting up the Hadoop environment
16.6 The Include and Exclude configuration files
16.7 The administration and maintenance of name node, data node directory structures, and files
16.8 What is a File system image?
16.9 Understanding Edit log

Hands-on Exercise:

1. The process of performance tuning in MapReduce

Module 17 - Hadoop Administration – Maintenance, Monitoring and Troubleshooting

17.1 Introduction to the checkpoint procedure, name node failure
17.2 How to ensure the recovery procedure, Safe Mode, Metadata and Data backup, various potential problems and solutions, what to look for and how to add and remove nodes

Hands-on Exercise:

1. How to go about ensuring the MapReduce File System Recovery for different scenarios
2. JMX monitoring of the Hadoop cluster
3. How to use the logs and stack traces for monitoring and troubleshooting
4. Using the Job Scheduler for scheduling jobs in the same cluster
5. Getting the MapReduce job submission flow
6. FIFO schedule
7. Getting to know the Fair Scheduler and its configuration

Module 18 - ETL Connectivity with Hadoop Ecosystem (Self-Paced)

18.1 How ETL tools work in Big Data industry?
18.2 Introduction to ETL and data warehousing
18.3 Working with prominent use cases of Big Data in ETL industry
18.4 End-to-end ETL PoC showing Big Data integration with ETL tool

Hands-on Exercise:

1. Connecting to HDFS from ETL tool
2. Moving data from Local system to HDFS
3. Moving data from DBMS to HDFS,
4. Working with Hive with ETL Tool
5. Creating MapReduce job in ETL tool

Module 19 - Project Solution Discussion and Cloudera Certification Tips and Tricks

19.1 Working towards the solution of the Hadoop project solution
19.2 Its problem statements and the possible solution outcomes
19.3 Preparing for the Cloudera certifications
19.4 Points to focus on scoring the highest marks
19.5 Tips for cracking Hadoop interview questions

Hands-on Exercise:

1. The project of a real-world high value Big Data Hadoop application
2. Getting the right solution based on the criteria set by the Intellipaat team

Following topics will be available only in self-paced mode:

Module 20 - Hadoop Application Testing

20.1 Importance of testing
20.2 Unit testing, Integration testing, Performance testing, Diagnostics, Nightly QA test, Benchmark and end-to-end tests, Functional testing, Release certification testing, Security testing, Scalability testing, Commissioning and Decommissioning of data nodes testing, Reliability testing, and Release testing

Module 21 - Roles and Responsibilities of Hadoop Testing Professional

21.1 Understanding the Requirement
21.2 Preparation of the Testing Estimation
21.3 Test Cases, Test Data, Test Bed Creation, Test Execution, Defect Reporting, Defect Retest, Daily Status report delivery, Test completion, ETL testing at every stage (HDFS, Hive and HBase) while loading the input (logs, files, records, etc.) using Sqoop/Flume which includes but not limited to data verification, Reconciliation, User Authorization and Authentication testing (Groups, Users, Privileges, etc.), reporting defects to the development team or manager and driving them to closure
21.4 Consolidating all the defects and create defect reports
21.5 Validating new feature and issues in Core Hadoop

Module 22 - Framework Called MRUnit for Testing of MapReduce Programs

22.1 Report defects to the development team or manager and driving them to closure
22.2 Consolidate all the defects and create defect reports
22.3 Responsible for creating a testing framework called MRUnit for testing of MapReduce programs

Module 23 - Unit Testing

23.1 Automation testing using the OOZIE
23.2 Data validation using the query surge tool

Module 24 - Test Execution

24.1 Test plan for HDFS upgrade
24.2 Test automation and result

Module 25 - Test Plan Strategy and Writing Test Cases for Testing Hadoop Application

25.1 Test, install and configure

What Hadoop Projects You Will Be Working on?

Project 01: Working with MapReduce, Hive and Sqoop

Industry: General

Problem Statement: How to successfully import data using Sqoop into HDFS for data analysis

Topics: As part of this project, you will work on the various Hadoop components like MapReduce, Apache Hive and Apache Sqoop. You will have to work with Sqoop to import data from relational database management system like MySQL data into HDFS. You need to deploy Hive for summarizing data, querying and analysis. You have to convert SQL queries using HiveQL for deploying MapReduce on the transferred data. You will gain considerable proficiency in Hive and Sqoop after the completion of this project.

Highlights:

1.1 Sqoop data transfer from RDBMS to Hadoop
1.2 Coding in Hive Query Language
1.3 Data querying and analysis

Project 02: Work on MovieLens data for finding the top movies

Industry: Media and Entertainment

Problem Statement: How to create the top-ten-movies list using the MovieLens data

Topics: In this project you will work exclusively on data collected through MovieLens available rating data sets. The project involves writing MapReduce program to analyze the MovieLens data and creating the list of top ten movies. You will also work with Apache Pig and Apache Hive for working with distributed datasets and analyzing it.

Highlights:

2.1 MapReduce program for working on the data file
2.2 Apache Pig for analyzing data
2.3 Apache Hive data warehousing and querying

Project 03:  Hadoop YARN Project; End-to-end PoC

Industry: Banking

Problem Statement: How to bring the daily data (incremental data) into the Hadoop Distributed File System

Topics: In this project, we have transaction data which is daily recorded/stored in the RDBMS. Now this data is transferred everyday into HDFS for further Big Data Analytics. You will work on live Hadoop YARN cluster. YARN is part of the Hadoop ecosystem that lets Hadoop to decouple from MapReduce and deploy more competitive processing and wider array of applications. You will work on the YARN central resource manager.

Highlights:

3.1 Using Sqoop commands to bring the data into HDFS
3.2 End-to-end flow of transaction data
3.3 Working with the data from HDFS

Project 04: Table Partitioning in Hive

Industry: Banking

Problem Statement:  How to improve the query speed using Hive data partitioning

Topics: This project involves working with Hive table data partitioning. Ensuring the right partitioning helps to read the data, deploy it on the HDFS and run the MapReduce jobs at a much faster rate. Hive lets you partition data in multiple ways. This will give you hands-on experience in partitioning of Hive tables manually, deploying single SQL execution in dynamic partitioning and bucketing of data so as to break it into manageable chunks.

Highlights:

4.1 Manual Partitioning
4.2 Dynamic Partitioning
4.3 Bucketing

Project 05: Connecting Pentaho with Hadoop Ecosystem

Industry: Social Network

Problem Statement:  How to deploy ETL for data analysis activities

Topics: This project lets you connect Pentaho with the Hadoop ecosystem. Pentaho works well with HDFS, HBase, Oozie and ZooKeeper. You will connect the Hadoop cluster with Pentaho data integration, analytics, Pentaho server and report designer. This project will give you complete working knowledge on the Pentaho ETL tool.

Highlights:

5.1 Working knowledge of ETL and Business Intelligence
5.2 Configuring Pentaho to work with Hadoop distribution
5.3 Loading, transforming and extracting data into Hadoop cluster

Project 06: Multi-node Cluster Setup

Industry: General

Problem Statement: How to setup a Hadoop real-time cluster on Amazon EC2

Topics: This is a project that gives you opportunity to work on real world Hadoop multi-node cluster setup in a distributed environment. You will get a complete demonstration of working with various Hadoop cluster master and slave nodes, installing Java as a prerequisite for running Hadoop, installation of Hadoop and mapping the nodes in the Hadoop cluster.

Highlights:

6.1 Hadoop installation and configuration
6.2 Running a Hadoop multi-node using a 4-node cluster on Amazon EC2
6.3 Deploying of MapReduce job on the Hadoop cluster

Project 07: Hadoop Testing Using MRUnit

Industry: General

Problem Statement:  How to test MapReduce applications

Topics: In this project, you will gain proficiency in Hadoop MapReduce code testing using MRUnit. You will learn about real-world scenarios of deploying MRUnit, Mockito and PowerMock. This will give you hands-on experience in various testing tools for Hadoop MapReduce. After completion of this project you will be well-versed in test-driven development and will be able to write light-weight test units that work specifically on the Hadoop architecture.

Highlights:

7.1 Writing JUnit tests using MRUnit for MapReduce applications
7.2 Doing mock static methods using PowerMock and Mockito
7.3 MapReduce Driver for testing the map and reduce pair

Project 08: Hadoop Web Log Analytics

Industry: Internet Services

Problem Statement: How to derive insights from web log data

Topics: This project is involved with making sense of all the web log data in order to derive valuable insights from it. You will work with loading the server data onto a Hadoop cluster using various techniques. The web log data can include various URLs visited, cookie data, user demographics, location, date and time of web service access, etc. In this project, you will transport the data using Apache Flume or Kafka, workflow and data cleansing using MapReduce, Pig or Spark. The insight thus derived can be used for analyzing customer behavior and predict buying patterns.

Highlights:

8.1 Aggregation of log data
8.2 Apache Flume for data transportation
8.3 Processing of data and generating analytics

Project 09: Hadoop Maintenance

Industry: General

Problem Statement:  How to administer a Hadoop cluster

Topics: This project is involved with working on the Hadoop cluster for maintaining and managing it. You will work on a number of important tasks that include recovering of data, recovering from failure, adding and removing of machines from the Hadoop cluster and onboarding of users on Hadoop.

Highlights:

9.1 Working with name node directory structure
9.2 Audit logging, data node block scanner and balancer
9.3 Failover, fencing, DISTCP and Hadoop file formats

Project 10: Twitter Sentiment Analysis

Industry: Social Media

Problem Statement: Find out what is the reaction of the people to the demonetization move by India by analyzing their tweets

Topics:  This Project involves analyzing the tweets of people by going through what they are saying about the demonetization decision taken by the Indian government. Then you look for key phrases and words and analyze them using the dictionary and the value attributed to them based on the sentiment that they are conveying.

Highlights:

10.1 Download the tweets and load into Pig storage
10.2 Divide tweets into words to calculate sentiment
10.3 Rating the words from +5 to −5 on AFFIN dictionary
10.4 Filtering the tweets and analyzing sentiment

Project 11: Analyzing IPL T20 Cricket

Industry:  Sports and Entertainment

Problem Statement: Analyze the entire cricket match and get answers to any question regarding the details of the match

Topics:  This project involves working with the IPL dataset that has information regarding batting, bowling, runs scored, wickets taken and more. This dataset is taken as input, and then it is processed so that the entire match can be analyzed based on the user queries or needs.

Highlights:

11.1 Load the data into HDFS
11.2 Analyze the data using Apache Pig or Hive
11.3 Based on user queries give the right output

Apache Spark Projects

Project 01: Movie Recommendation

Industry: Entertainment

Problem Statement:  How to recommend the most appropriate movie to a user based on his taste

Topics: This is a hands-on Apache Spark project deployed for the real-world application of movie recommendations. This project helps you gain essential knowledge in Spark MLlib which is a Machine Learning library; you will know how to create collaborative filtering, regression, clustering and dimensionality reduction using Spark MLlib. Upon finishing the project, you will have first-hand experience in the Apache Spark streaming data analysis, sampling, testing and statistics, among other vital skills.

Highlights:

1.1 Apache Spark MLlib component
1.2 Statistical analysis
1.3 Regression and clustering

Project 02: Twitter API Integration for Tweet Analysis

Industry: Social Media

Problem Statement:  Analyzing the user sentiment based on the tweet

Topics: This is a hands-on Twitter analysis project using the Twitter API for analyzing of tweets. You will integrate the Twitter API and do programming using Python or PHP for developing the essential server-side codes. Finally, you will be able to read the results for various operations by filtering, parsing and aggregating it depending on the tweet analysis requirement.

Highlights:

2.1 Making requests to Twitter API
2.2 Building the server-side codes
2.3 Filtering, parsing and aggregating data

Project 03: Data Exploration Using Spark SQL – Wikipedia Data Set

Industry: Internet

Problem Statement:  Making sense of Wikipedia data using Spark SQL

Topics: In this project you will be using the Spark SQL tool for analyzing the Wikipedia data. You will gain hands-on experience in integrating Spark SQL for various applications like batch analysis, Machine Learning, visualizing and processing of data and ETL processes, along with real-time analysis of data.

Highlights:

3.1 Machine Learning using Spark
3.2 Deploying data visualization
3.3 Spark SQL integration

Introduction to Data Visualization and Power of Tableau

What is data visualization?, comparison and benefits against reading raw numbers, real use cases from various business domains, some quick and powerful examples using Tableau without going into the technical details of Tableau, installing Tableau, Tableau interface, connecting to DataSource, Tableau data types, and data preparation.

Architecture of Tableau

Installation of Tableau Desktop, architecture of Tableau, interface of Tableau (Layout, Toolbars, Data Pane, Analytics Pane, etc.) how to start with Tableau, and the ways to share and export the work done in Tableau.

Hands-on Exercise: Play with Tableau desktop, learn about the interface, and share and export existing works.

Working with Metadata and Data Blending

Connection to Excel, cubes and PDFs, management of metadata and extracts, data preparation, Joins (Left, Right, Inner, and Outer) and Union, dealing with NULL values, cross-database joining, data extraction, data blending, refresh extraction, incremental extraction, how to build extract , etc.

Hands-on Exercise: Connect to Excel sheet to import data, use metadata and extracts, manage NULL values, clean up data before using, perform the join techniques, execute data blending from multiple sources , etc.

Creation of Sets

Mark, highlight, sort, group, and use sets (creating and editing sets, IN/OUT, sets in hierarchies), constant sets, computed sets, bins, etc.

Hands-on Exercise: Use marks to create and edit sets, highlight the desired items, make groups, apply sorting on results, and make hierarchies among the created sets.

Working with Filters

Filters (addition and removal), filtering continuous dates, dimensions, and measures, interactive filters, marks card, hierarchies, how to create folders in Tableau, sorting in Tableau, types of sorting, filtering in Tableau, types of filters, filtering the order of operations, etc.

Hands-on Exercise: Use the data set by date/dimensions/measures to add filter, use interactive filter to view the data, customize/remove filters to view the result, etc.

Organizing Data and Visual Analytics

Using Formatting Pane to work with menu, fonts, alignments, settings, and copy-paste; formatting data using labels and tooltips, edit axes and annotations, k-means cluster analysis, trend and reference lines, visual analytics in Tableau, forecasting, confidence interval, reference lines, and bands.

Hands-on Exercise: Apply labels and tooltips to graphs, annotations, edit axes’ attributes, set the reference line, and perform k-means cluster analysis on the given dataset.

Working with Mapping

Working on coordinate points, plotting longitude and latitude, editing unrecognized locations, customizing geocoding, polygon maps, WMS: web mapping services, working on the background image, including add image, plotting points on images and generating coordinates from them; map visualization, custom territories, map box, WMS map; how to create map projects in Tableau, creating dual axes maps, and editing locations.

Hands-on Exercise: Plot longitude and latitude on a geo map, edit locations on the geo map, custom geocoding, use images of the map and plot points, find coordinates, create a polygon map, and use WMS.

Working with Calculations and Expressions

Calculation syntax and functions in Tableau, various types of calculations, including Table, String, Date, Aggregate, Logic, and Number; LOD expressions, including concept and syntax; aggregation and replication with LOD expressions, nested LOD expressions; levels of details: fixed level, lower level, and higher level;  quick table calculations, the creation of calculated fields, predefined calculations, and how to validate.

Working with Parameters

Creating parameters, parameters in calculations, using parameters with filters, column selection parameters, chart selection parameters, how to use parameters in the filter session, how to use parameters in calculated fields, how to use parameters in reference line, etc.

Hands-on Exercise: Creating new parameters to apply on a filter, passing parameters to filters to select columns, passing parameters to filters to select charts, etc.

Charts and Graphs

Dual axes graphs, histograms: single and dual axes; box plot; charts: motion, Pareto, funnel, pie, bar, line, bubble, bullet, scatter, and waterfall charts; maps: tree and heat maps; market basket analysis (MBA), using Show me; and text table and highlighted table.

Hands-on Exercise: Plot a histogram, tree map, heat map, funnel chart, and more using the given dataset and also perform market basket analysis (MBA) on the same dataset.

Dashboards and Stories

Building and formatting a dashboard using size, objects, views, filters, and legends; best practices for making creative as well as interactive dashboards using the actions; creating stories, including the intro of story points; creating as well as updating the story points, adding catchy visuals in stories, adding annotations with descriptions; dashboards and stories: what is dashboard?, highlight actions, URL actions, and filter actions, selecting and clearing values, best practices to create dashboards, dashboard examples; using Tableau workspace and Tableau interface; learning about Tableau joins, types of joins; Tableau field types, saving as well as publishing data source, live vs extract connection, and various file types.

Hands-on Exercise: Create a Tableau dashboard view, include legends, objects, and filters, make the dashboard interactive, and use visual effects, annotations, and description s to create and edit a story.

Tableau Prep

Introduction to Tableau Prep, how Tableau Prep helps quickly combine join, shape, and clean data for analysis, creation of smart examples with Tableau Prep, getting deeper insights into the data with great visual experience, making data preparation simpler and accessible, integrating Tableau Prep with Tableau analytical workflow, and understanding the seamless process from data preparation to analysis with Tableau Prep.

Integration of Tableau with R and Hadoop

Introduction to R language, applications and use cases of R, deploying R on the Tableau platform, learning R functions in Tableau, and the integration of Tableau with Hadoop.

Hands-on Exercise: Deploy R on Tableau, create a line graph using R interface, and also connect Tableau with Hadoop to extract data.

What are the projects I will be working on during this Tableau certification training?

Project 1 : Analyzing global COVID-19 data with interactive Tableau dashboard

Domain : Healthcare, COVID-19

Problem statement : Analyzing, understanding and comparing the COVID-19 cases across different countries

Description : In this project, you will be working on two data sets having country wise information which includes the number of confirmed cases, number of death cases, number of new cases and deaths per day. Data sets are to be related, joined or blended with each other to proceed with this dashboard. You will apply filters, parameters, actions and calculations wherever necessary to get the desired results according to the problem statements mentioned in the project work. Depending on factors such as the fields used to visualize, number of values in each field and the problem statement, appropriate charts and graphs are to be used. The final dashboard should be interactive in nature, allowing users to interactive and analyze data as per their requirement.

Highlights :

  • Cleansing and Combining data sets
  • Using Top filters and parameters that work in a dynamic way
  • Showing results in both log and default axes
  • Enhancing visualizations using available features and mark cards
  • Creating insightful and interactive dashboard
  • Including drop down menus and filters in the final dashboard to make it interactive

Project 2 : Tableau dashboard for analyzing the UK bank customer data.

Domain : Bank customer data

Problem statement : Understanding the Region wise customer details in the UK bank data set provided

Description :  In this project, you will be working on this bank data which has the region wise customer details with their respective job classifications, gender-age details and their balances maintained in the bank. You will be creating pie charts, donut pie charts, asymmetric drill downs and motion charts for an insightful visualization. A day wise forecast on balance is to be  calculated using Exponential smoothing – an inbuilt forecasting tool in Tableau. Final dashboard should be interactive with filters and highlighters used.

Highlights :

  • Cleansing data, if necessary, using the data interpreter
  • Creating asymmetric drill downs using set actions
  • Histograms for analyzing the distribution of a measure
  • Using LOD expressions to perform calculations in most granular levels
  • Highlighters to colour specific marks of interest
  • Global filters and parameters for interactivity

Project 3 : Tableau dashboard for analyzing the Financial data

Domain : Retail, Finance

Problem statement :  Analyzing the country wise product data to understand the key performance indicators such as sales and profit to improvise the performance and sales of the products

Description : In this project, you will be analyzing the country wise sales and profit for each of its segments and products. World maps are to be used for an interactive analysis with detailed tool tips. Country maps are displayed using interactive filters. Motion charts and customized shapes are used for enhancing visualizations. Annotations and drop lines are inserted wherever necessary. Phone and tablet layouts are added for enabling mobility of dashboards after publishing. Analyzing the outliers for each country is a major problem statement in this project.

Highlights :

  • Maps are created using geographic data type fields
  • Single value list parameters for interactive analysis
  • URL actions to direct users to required web pages
  • Box plots to identify outliers in the data

Project 4 : Tableau dashboard for understanding the agricultural data

Domain : Agriculture

Problem statement : Agricultural Area, yield and production wise analysis per state

Description : In this project, you will have to analyze and understand  data corresponding to a few states of India. Various seasonal crop categories and respective crops’ details under each category are provided for detailed analysis. Interactive drill down tree maps are to be used for insightful visualizations. Ranking crops based on their yeild value per year, seasonal pie charts with production details, district wise charts are a few of the requirements and problem statements of this project.

Highlights :

  • Cleansing data before building dashboards
  • Drill down tree maps using set actions
  • Parameters and actions for interactivity
  • Manually created hierarchies for drill downs
  • Action by filter to use worksheets as filters

Introduction to SAS

Installation and introduction to SAS, how to get started with SAS, understanding different SAS windows, how to work with data sets, various SAS windows like output, search, editor, log and explorer and understanding the SAS functions, which are various library types and programming files

SAS Enterprise Guide

How to import and export raw data files, how to read and subset the data sets, different statements like SET, MERGE and WHERE

Hands-on Exercise: How to import the Excel file in the workspace and how to read data and export the workspace to save data

SAS Operators and Functions

Different SAS operators like logical, comparison and arithmetic, deploying different SAS functions like Character, Numeric, Is Null, Contains, Like and Input/Output, along with the conditional statements like If/Else, Do While, Do Until and so on

Hands-on Exercise: Performing operations using the SAS functions and logical and arithmetic operations

Compilation and Execution

Understanding about input buffer, PDV (backend) and learning what is Missover

Using Variables

Defining and using KEEP and DROP statements, apply these statements and formats and labels in SAS

Hands-on Exercise: Use KEEP and DROP statements

Creation and Compilation of SAS Data Sets

Understanding the delimiter, dataline rules, DLM, delimiter DSD, raw data files and execution and list input for standard data

Hands-on Exercise: Use delimiter rules on raw data files

SAS Procedures

Various SAS standard procedures built-in for popular programs: PROC SORT, PROC FREQ, PROC SUMMARY, PROC RANK, PROC EXPORT, PROC DATASET, PROC TRANSPOSE, PROC CORR, etc.

Hands-on Exercise: Use SORT, FREQ, SUMMARY, EXPORT and other procedures

Input Statement and Formatted Input

Reading standard and non-standard numeric inputs with formatted inputs, column pointer controls, controlling while a record loads, line pointer control/absolute line pointer control, single trailing, multiple IN and OUT statements, dataline statement and rules, list input method and comparing single trailing and double trailing

Hands-on Exercise:  Read standard and non-standard numeric inputs with formatted inputs, control while a record loads, control a line pointer and write multiple IN and OUT statements

SAS Format

SAS Format statements: standard and user-written, associating a format with a variable, working with SAS Format, deploying it on PROC data sets and comparing ATTRIB and Format statements

Hands-on Exercise: Format a variable, deploy format rule on PROC data set and use ATTRIB statement

SAS Graphs

Understanding PROC GCHART, various graphs, bar charts: pie, bar and 3D and plotting variables with PROC GPLOT

Hands-on Exercise: Plot graphs using PROC GPLOT and display charts using PROC GCHART

Interactive Data Processing

SAS advanced data discovery and visualization, point-and-click analytics capabilities and powerful reporting tools

Data Transformation Function

Character functions, numeric functions and converting variable type

Hands-on Exercise: Use functions in data transformation

Output Delivery System (ODS)

Introduction to ODS, data optimization and how to generate files (rtf, pdf, html and doc) using SAS

Hands-on Exercise: Optimize data and generate rtf, pdf, html and doc files

SAS Macros

Macro Syntax, macro variables, positional parameters in a macro and macro step

Hands-on Exercise: Write a macro and use positional parameters

PROC SQL

SQL statements in SAS, SELECT, CASE, JOIN and UNION and sorting data

Hands-on Exercise: Create SQL query to select and add a condition and use a CASE in select query

Advanced Base SAS

Base SAS web-based interface and ready-to-use programs, advanced data manipulation, storage and retrieval and descriptive statistics

Hands-on Exercise: Use web UI to do statistical operations

Summarization Reports

Report enhancement, global statements, user-defined formats, PROC SORT, ODS destinations, ODS listing, PROC FREQ, PROC Means, PROC UNIVARIATE, PROC REPORT and PROC PRINT

Hands-on Exercise: Use PROC SORT to sort the results, list ODS, find mean using PROC Means and print using PROC PRINT

What projects I will be working on this SAS training?

Project 1: Categorization of Patients Based on the Count of Drugs for Their Therapy

Domain: Healthcare

Objective: This project aims to find out descriptive statistics and subset for specific clinical data problems. It will give them brief insight about Base SAS procedures and data steps.

Problem Statement:

Count the number of patients,

  1. Who were ever on at least one of the four drugs
  2. Who were ever on each of the four drugs
  3. Who had never been on any drug

Output should be four datasets

  1. TYPA – Contains the list of patients from problem 1
  2. TYPB – Contains the list of patients from problem 2
  3. TYPC – Contains the list of patients from problem 3
  4. SUMMARY – Contains the summary of counts for each of three problems

Project 2: Build Revenue Projections Reports

Domain: Sales

Objective: This project will give you hands-on experience in working with the SAS data analytics and business intelligence tool. You will be working on the data entered in a business enterprise setup and will aggregate, retrieve and manage that data. You will learn to create insightful reports and graphs and come up with statistical and mathematical analysis to scientifically predict the revenue projection for a particular future time frame. Upon the completion of the project, you will be well-versed in the practical aspects of data analytics, predictive modeling and data mining.

Project 3: Impact of Pre-paid Plans on the Preferences of Investors

Domain: Finance Market

Objective: The project aims to find the most impacting factors in preferences of pre-paid model; it also identifies which all are the variables highly correlated with impacting factors.

Problem Statement:

  • The project aims to identify various reasons for pre-paid model preference and non-preference among the investors, to understand the penetration of the pre-paid model in the brokerage firms and, to identify the pre-paid scheme advantages and disadvantages and also to identify brand-wise market share. In addition to this, the project also looks to identify various insights that would help a newly established brand to foray deeper into the market on a large scale.

Project 4:K-Means Cluster Analysis on Iris Dataset

Domain: Analytics

Objective: K-Means cluster analysis on Iris dataset to predict about the class of a flower using its petal’s dimensions

Requirements:

  • Using the famous Iris dataset, predict the class of a flower
  • Perform K-Means cluster analysis

Entering Data

Introduction to Excel spreadsheet, learning to enter data, filling of series and custom fill list, editing and deleting fields.

Referencing in Formulas

Learning about relative and absolute referencing, the concept of relative formulae, the issues in relative formulae, creating of absolute and mixed references and various other formulae.

Name Range

Creating names range, using names in new formulae, working with the name box, selecting range, names from a selection, pasting names in formulae, selecting names and working with Name Manager.

Understanding Logical Functions

the various logical functions in Excel, the If function for calculating values and displaying text, nested If functions, VLookUp and IFError functions.

Getting started with Conditional Formatting

Learning about conditional formatting, the options for formatting cells, various operations with icon sets, data bars and color scales, creating and modifying sparklines.

Advanced-level Validation

multi-level drop down validation, restricting value from list only, learning about error messages and cell drop down.

Important Formulas in Excel

Introduction to the various formulae in Excel like Sum, SumIF & SumIFs, Count, CountA, CountIF and CountBlank, Networkdays, Networkdays International, Today & Now function, Trim (Eliminating undesirable spaces), Concatenate (Consolidating columns)

Working with Dynamic table

Introduction to dynamic table in Excel, data conversion, table conversion, tables for charts and VLOOKUP.

Data Sorting

Sorting in Excel, various types of sorting including, alphabetical, numerical, row, multiple column, working with paste special, hyperlinking and using subtotal.

Data Filtering

The concept of data filtering, understanding compound filter and its creation, removing of filter, using custom filter and multiple value filters, working with wildcards.

Chart Creation

Creation of Charts in Excel, performing operations in embedded chart, modifying, resizing, and dragging of chart.

Various Techniques of Charting

Introduction to the various types of charting techniques, creating titles for charts, axes, learning about data labels, displaying data tables, modifying axes, displaying gridlines and inserting trendlines, textbox insertion in a chart, creating a 2-axis chart, creating combination chart.

Pivot Tables in Excel

The concept of Pivot tables in Excel, report filtering, shell creation, working with Pivot for calculations, formatting of reports, dynamic range assigning, the slicers and creating of slicers.

Ensuring Data and File Security

Data and file security in Excel, protecting row, column, and cell, the different safeguarding techniques.

Getting started with VBA Macros

Learning about VBA macros in Excel, executing macros in Excel, the macro shortcuts, applications, the concept of relative reference in macros.

Core concepts of VBA

In-depth understanding of Visual Basic for Applications, the VBA Editor, module insertion and deletion, performing action with Sub and ending Sub if condition not met.

Ranges and Worksheet in VBA

Learning about the concepts of workbooks and worksheets in Excel, protection of macro codes, range coding, declaring a variable, the concept of Pivot Table in VBA, introduction to arrays, user forms, getting to know how to work with databases within Excel.

IF condition

Learning how the If condition works and knowing how to apply it in various scenarios, working with multiple Ifs in Macro.

Loops in VBA

Understanding the concept of looping, deploying looping in VBA Macros.

Debugging in VBA

Studying about debugging in VBA, the various steps of debugging like running, breaking, resetting, understanding breakpoints and way to mark it, the code for debugging and code commenting.

Messaging in VBA

The concept of message box in VBA, learning to create the message box, various types of message boxes, the IF condition as related to message boxes.

Practical Projects in VBA

Mastering the various tasks and functions using VBA, understanding data separation, auto filtering, formatting of report, combining multiple sheets into one, merging multiple files together.

Best Practices of Dashboards Visualization

Introduction to powerful data visualization with Excel Dashboard, important points to consider while designing the dashboards like loading the data, managing data and linking the data to tables and charts, creating Reports using dashboard features.

Principles of Charting

Learning to create charts in Excel, the various charts available, the steps to successfully build a chart, personalization of charts, formatting and updating features, various special charts for Excel dashboards, understanding how to choose the right chart for the right data.

Getting started with Pivot Tables

Creation of Pivot Tables in Excel, learning to change the Pivot Table layout, generating Reports, the methodology of grouping and ungrouping of data.

Creating Dashboards

Learning to create Dashboards, the various rules to follow while creating Dashboards, creation of dynamic dashboards, knowing what is data layout, introduction to thermometer chart and its creation, how to use alerts in the Dashboard setup.

Creation of Interactive Components

How to insert a Scroll bar to a data window?, Concept of Option buttons in a chart, Use of combo box drop-down, List box control Usage, How to use Checkbox Control?

Data Analysis

Understanding data quality issues in Excel, linking of data, consolidating and merging data, working with dashboards for Excel Pivot Tables.

What projects I will be working on this Excel certification training?

Project – if Function

Data – Employee

Problem Statement – It describes about if function and how to implement this if function. It includes following actions:

Calculates Bonus for all employee at 10% of their salary using if Function, Rate the salesman based on the sales and the rating scale., Find the number of times “3” is repeated in the table and find the number of values greater than 5 using Count Function, Uses of Operators and nested if function

Introduction to NoSQL and MongoDB

RDBMS, types of relational databases, challenges of RDBMS, NoSQL database, its significance, how NoSQL suits Big Data needs, introduction to MongoDB and its advantages, MongoDB installation, JSON features, data types and examples

MongoDB Installation

Installing MongoDB, basic MongoDB commands and operations, MongoChef (MongoGUI) installation and MongoDB data types

Hands-on Exercise: Install MongoDB and install MongoChef (MongoGUI)

Importance of NoSQL

The need for NoSQL, types of NoSQL databases, OLTP, OLAP, limitations of RDBMS, ACID properties, CAP Theorem, Base property, learning about JSON/BSON, database collection and documentation, MongoDB uses, MongoDB write concern—acknowledged, replica acknowledged, unacknowledged, journaled—and Fsync

Hands-on Exercise: Write a JSON document

CRUD Operations

Understanding CRUD and its functionality, CRUD concepts, MongoDB query and syntax and read and write queries and query optimization

Hands-on Exercise:Use insert query to create a data entry, use find query to read data, use update and replace queries to update and use delete query operations on a DB file

Data Modeling and Schema Design

Concepts of data modelling, difference between MongoDB and RDBMS modelling, model tree structure, operational strategies, monitoring and backup

Hands-on Exercise: Write a data model tree structure for a family hierarchy

Data Management and Administration

In this module, you will learn MongoDB® Administration activities such as health check, backup, recovery, database sharding and profiling, data import/export, performance tuning, etc.

Hands-on Exercise: Use shard key and hashed shard keys, perform backup and recovery of a dummy dataset, import data from a CSV file and export data to a CSV file

Data Indexing and Aggregation

Concepts of data aggregation and types and data indexing concepts, properties and variations

Hands-on Exercise: Do aggregation using pipeline, sort, skip and limit and create index on data using single key and using multi-key

MongoDB Security

Understanding database security risks, MongoDB security concept and security approach and MongoDB integration with Java and Robomongo

Hands-on Exercise: MongoDB integration with Java and Robomongo

Working with Unstructured Data

Implementing techniques to work with variety of unstructured data like images, videos, log data and others and understanding GridFS MongoDB file system for storing data

Hands-on Exercise: Work with variety of unstructured data like images, videos, log data and others

What projects I will be working on this MongoDB training?

Project: Working with the MongoDB Java Driver

Industry: General

Problem Statement: How to create table for video insertion using Java

Topics: In this project, you will work with MongoDB Java Driver and become proficient in creating a table for inserting video using Java programming. You will work with collections and documents and understand the read and write basics of MongoDB database and the Java virtual machine libraries.

Highlights:

  • Setting up MongoDB JDBC Driver
  • Connecting to the database
  • Java virtual machine libraries

Introduction to SQL

Various types of databases, introduction to Structured Query Language, distinction between client server and file server databases, understanding SQL Server Management Studio, SQL Table basics, data types and functions, Transaction-SQL, authentication for Windows, data control language, and the identification of the keywords in T-SQL, such as Drop Table.

Database Normalization and Entity Relationship Model

Data Anomalies, Update Anomalies, Insertion Anomalies, Deletion Anomalies, Types of Dependencies, Functional Dependency, Fully functional dependency, Partial functional dependency, Transitive functional dependency, Multi-valued functional dependency, Decomposition of tables, Lossy decomposition, Lossless decomposition, What is Normalization?, First Normal Form, Second Normal Form, Third Normal Form, Boyce-Codd Normal Form(BCNF), Fourth Normal Form, Entity-Relationship Model, Entity and Entity Set, Attributes and types of Attributes, Entity Sets, Relationship Sets, Degree of Relationship, Mapping Cardinalities, One-to-One, One-to-Many, Many-to-one, Many-to-many, Symbols used in E-R Notation.

SQL Operators

Introduction to relational databases, fundamental concepts of relational rows, tables, and columns; several operators (such as logical and relational), constraints, domains, indexes, stored procedures, primary and foreign keys, understanding group functions, the unique key, etc.

Working with SQL: Join, Tables, and Variables

Advanced concepts of SQL tables, SQL functions, operators & queries, table creation, data retrieval from tables, combining rows from tables using inner, outer, cross, and self joins, deploying operators such as ‘intersect,’ ‘except,’ ‘union,’ temporary table creation, set operator rules, table variables, etc.

Deep Dive into SQL Functions

Understanding SQL functions – what do they do?, scalar functions, aggregate functions, functions that can be used on different datasets, such as numbers, characters, strings, and dates, inline SQL functions, general functions, and duplicate functions.

Working with Subqueries

Understanding SQL subqueries, their rules; statements and operators with which subqueries can be used, using the set clause to modify subqueries, understanding different types of subqueries, such as where, select, insert, update, delete, etc., and methods to create and view subqueries.

SQL Views, Functions, and Stored Procedures

Learning SQL views, methods of creating, using, altering, renaming, dropping, and modifying views; understanding stored procedures and their key benefits, working with stored procedures, studying user-defined functions, and error handling.

Deep Dive into User-defined Functions

User-defined functions; types of UDFs, such as scalar, inline table value, multi-statement table, stored procedures and when to deploy them, what is rank function?, triggers, and when to execute triggers?

SQL Optimization and Performance

SQL Server Management Studio, using pivot in MS Excel and MS SQL Server, differentiating between Char, Varchar, and NVarchar, XL path, indexes and their creation, records grouping, advantages, searching, sorting, modifying data; clustered indexes creation, use of indexes to cover queries, common table expressions, and index guidelines.

Managing Data with Transact-SQL

Creating Transact-SQL queries, querying multiple tables using joins, implementing functions and aggregating data, modifying data, determining the results of DDL statements on supplied tables and data, and constructing DML statements using the output statement.

Querying Data with Advanced Transact-SQL Components

Querying data using subqueries and APPLY, querying data using table expressions, grouping and pivoting data using queries, querying temporal data and non-relational data, constructing recursive table expressions to meet business requirements, and using windowing functions to group and rank the results of a query.

Programming Databases Using Transact-SQL

Creating database programmability objects by using T-SQL, implementing error handling and transactions, implementing transaction control in conjunction with error handling in stored procedures, and implementing data types and NULL.

Designing and Implementing Database Objects

Designing and implementing relational database schema; designing and implementing indexes, learning to compare between indexed and included columns, implementing clustered index, and designing and deploying views and column store views.

Implementing Programmability Objects

Explaining foreign key constraints, using T-SQL statements, usage of Data Manipulation Language (DML), designing the components of stored procedures, implementing input and output parameters, applying error handling, executing control logic in stored procedures, and designing trigger logic, DDL triggers, etc.

Managing Database Concurrency

Applying transactions, using the transaction behavior to identify DML statements, learning about implicit and explicit transactions, isolation levels management, understanding concurrency and locking behavior, and using memory-optimized tables.

Optimizing Database Objects

Accuracy of statistics, formulating statistics maintenance tasks, dynamic management objects management, identifying missing indexes, examining and troubleshooting query plans, consolidating the overlapping indexes, the performance management of database instances, and SQL server performance monitoring.

Advanced Topics

Corelated Subquery, Grouping Sets, Rollup, Cube

Hands-on Exercise

Implementing Corelated Subqueries, Using EXISTS with a Correlated subquery, Using Union Query, Using Grouping Set Query, Using Rollup, Using CUBE to generate four grouping sets, Perform a partial CUBE.

Microsoft Courses: Study Material

  • Performance Tuning and Optimizing SQL Databases
  • Querying Data with Transact-SQL

What are the projects I will be working on during this Microsoft SQL certification training?

Project 1: Writing Complex Subqueries

Industry: General

Problem Statement: How to create subqueries using SQL?

Topics: This project will give you hands-on experience in working with SQL subqueries and utilizing them in various scenarios. Some of the subqueries that you will be working with and gaining hands-on experience in are: IN or NOT IN, ANY or ALL, EXISTS or NOT EXISTS, and other major queries.

Highlights:

  • Accessing and manipulating databases
  • Operators and control statements in SQL
  • Executing queries in SQL against databases

Project 2: Querying a Large Relational Database

Industry: General

Problem Statement: How to get details about customers by querying the database?

Topics: In this project, you will work on downloading a database and restoring it on the server. You will then query the database to get customer details like name, phone number, email ID, sales made in a particular month, increase in month-on-month sales, and even the total sales made to a particular customer.

Highlights:

  • Table basics and data types
  • Various SQL operators
  • Various SQL functions

Project 3: Relational Database Design

Industry: General

Problem Statement: How to convert a relational design into a table in SQL Server?

Topics: In this project, you will work on converting a relational design that has enlisted within it various users, user roles, user accounts, and their statuses. You will create a table in SQL Server and insert data into it. With at least two rows in each of the tables, you will ensure that you have created respective foreign keys.

Highlights:

  • Defining relations/attributes
  • Defining the primary keys
  • Creating foreign keys
View More

Free Career Counselling

Certification

This is a comprehensive course that is designed to clear multiple certifications such as:

  • Spark component of Cloudera Spark and Hadoop Developer Certification (CCA175)
  • Tableau Desktop Qualified Associate Exam
  • SAS Certified Base Programmer Exam
  • C100DEV: MongoDB Certified Developer Associate Exam
  • Microsoft 70-761 SQL Server Certification Exam
  • Microsoft 70-762 SQL Server Certification Exam

Furthermore, you will also be rewarded with the title ‘Data Scientist’ for completing the following learning path that is co-created with IBM:

  • Deep Learning with TensorFlow
  • Build Chatbots with Watson Assistant
  • R for Data Science
  • Spark MLlIb
  • Python for Data Science

The complete course is created and delivered in association with IBM to get top jobs in the world’s best organizations. The entire training includes real-world projects and case studies that are highly valuable.

Upon the completion of the training, you will have quizzes that will help you prepare for the above-mentioned certification exams and score top marks.

Intellipaat Certification is awarded upon successfully completing the project work and after they are reviewed by experts. Intellipaat certification is recognized in some of the biggest companies like Cisco, Cognizant, Mu Sigma, TCS, Genpact, Hexaware, Sony and Ericsson, among others.

Our Alumni works at top 3000+ companies

client-desktop client-mobile

Course Advisor

Suresh Paritala

Suresh Paritala

Solutions Architect at Microsoft, USA

A Senior Software Architect at NextGen Healthcare who has previously worked with IBM Corporation, Suresh Paritala has worked on Big Data, Data Science, Advanced Analytics, Internet of Things and Azure, along with AI domains like Machine Learning and Deep Learning. He has successfully implemented high-impact.

David Callaghan

David Callaghan

Big Data Solutions Architect, USA

An experienced Blockchain Professional who has been bringing integrated Blockchain, particularly Hyperledger and Ethereum, and Big Data solutions to the cloud, David Callaghan has previously worked on Hadoop, AWS Cloud, Big Data and Pentaho projects that have had major impact on revenues of marquee brands around the world.

Samanth Reddy

Data Team Lead at Sony, USA

A renowned Data Scientist who has worked with Google and is currently working at ASCAP, Samanth Reddy has a proven ability to develop Data Science strategies that have a high impact on the revenues of various organizations. He comes with strong Data Science expertise and has created decisive Data Science strategies for Fortune 500 corporations.

Frequently Asked Questions

What Is Intellipaat’s Master’s Course and How Is It Different from Individual Courses?

Intellipaat’s master’s course is a structured learning path specially designed by industry experts which ensures that you transform into a Data Science expert. Individual courses at Intellipaat focus on one or two specializations. However, if you have to master Data Science, then this program is for you.

At Intellipaat, you can enroll in either the instructor-led online training or self-paced training. Apart from this, Intellipaat also offers corporate training for organizations to upskill their workforce. All trainers at Intellipaat have 12+ years of relevant industry experience, and they have been actively working as consultants in the same domain, which has made them subject matter experts. Go through the sample videos to check the quality of our trainers.

Intellipaat is offering the 24/7 query resolution, and you can raise a ticket with the dedicated support team at anytime. You can avail of the email support for all your queries. If your query does not get resolved through email, we can also arrange one-on-one sessions with our trainers.

You would be glad to know that you can contact Intellipaat support even after the completion of the training. We also do not put a limit on the number of tickets you can raise for query resolution and doubt clearance.

Intellipaat offers self-paced training to those who want to learn at their own pace. This training also gives you the benefits of query resolution through email, live sessions with trainers, round-the-clock support, and access to the learning modules on LMS for a lifetime. Also, you get the latest version of the course material at no added cost.

Intellipaat’s self-paced training is 75 percent lesser priced compared to the online instructor-led training. If you face any problems while learning, we can always arrange a virtual live class with the trainers as well.

Intellipaat is offering you the most updated, relevant, and high-value real-world projects as part of the training program. This way, you can implement the learning that you have acquired in real-world industry setup. All training comes with multiple projects that thoroughly test your skills, learning, and practical knowledge, making you completely industry-ready.

You will work on highly exciting projects in the domains of high technology, ecommerce, marketing, sales, networking, banking, insurance, etc. After completing the projects successfully, your skills will be equal to 6 months of rigorous industry experience.

Intellipaat actively provides placement assistance to all learners who have successfully completed the training. For this, we are exclusively tied-up with over 80 top MNCs from around the world. This way, you can be placed in outstanding organizations such as Sony, Ericsson, TCS, Mu Sigma, Standard Chartered, Cognizant, and Cisco, among other equally great enterprises. We also help you with the job interview and résumé preparation as well.

You can definitely make the switch from self-paced training to online instructor-led training by simply paying the extra amount. You can join the very next batch, which will be duly notified to you.

Once you complete Intellipaat’s training program, working on real-world projects, quizzes, and assignments and scoring at least 60 percent marks in the qualifying exam, you will be awarded Intellipaat’s course completion certificate. This certificate is very well recognized in Intellipaat-affiliated organizations, including over 80 top MNCs from around the world and some of the Fortune 500companies.

Apparently, no. Our job assistance program is aimed at helping you land in your dream job. It offers a potential opportunity for you to explore various competitive openings in the corporate world and find a well-paid job, matching your profile. The final decision on hiring will always be based on your performance in the interview and the requirements of the recruiter.

View More

Talk To Us

Select Currency