Module 01 - Introduction to Data Science with R
1.1 What is Data Science
1.2 Significance of Data Science in today’s digitally-driven world, applications of Data Science, lifecycle of Data Science, components of the Data Science lifecycle
1.3 Introduction to big data and Hadoop, introduction to Machine Learning and Deep Learning,
1.4 Introduction to R programming and R Studio
1. Installation of R Studio
2. Implementing simple mathematical operations and logic using R operators, loops, if statements and switch cases.
Module 02 - Data Exploration
2.1 Introduction to data exploration
2.2 Importing and exporting data to/from external sources
2.3 What is data exploratory analysis, data importing, dataframes
2.4 working with dataframes, accessing individual elements, vectors and factors, operators, in-built functions, conditional, looping statements and user-defined functions, matrix, list and array.
1. Accessing individual elements of customer churn data
2. Modifying and extracting the results from the dataset using user-defined functions in R.
Module 03 - Data Manipulation
3.1 Need for Data Manipulation
3.2 Introduction to dplyr package
3.3 Selecting one or more columns with select() function, Filtering out records on the basis of a condition with filter() function, Adding new columns with the mutate() function, Sampling & Counting
3.4 Combining different functions with the pipe operator, Implementing sql like operations with sqldf.
1. Implementing dplyr
2. perform various operations for abstracting over how data is manipulated and stored.
Module 04 - Data Visualization
4.1 Introduction to visualization
4.2 Different types of graphs, Introduction to grammar of graphics & ggplot2 package, Understanding categorical distribution with geom_bar() function, understanding numerical distribution with geom_hist() function, building frequency polygons with geom_freqpoly(), making a scatter-plot with geom_pont() function
4.3 Multivariate analysis with geom_boxplot
4.4 Univariate Analysis with Bar-plot, histogram and Density Plot, multivariate distribution
4.5 Bar-plots for categorical variables using geom_bar(), adding themes with the theme() layer
4.6 Visualization with plotly package & building web applications with shinyR, frequency-plots with geom_freqpoly(), multivariate distribution with scatter-plots and smooth lines, continuous vs categorical with box-plots, subgrouping the plots
4.7 Working with co-ordinates and themes to make the graphs more presentable, Intro to plotly & various plots, visualization with ggvis package
4.8 Geographic visualization with ggmap(), building web applications with shinyR.
1. Creating data visualization to understand the customer churn ratio using charts using ggplot2
2. Plotly for importing and analyzing data into grids
3. Visualize tenure, monthly charges, total charges and other individual columns by using the scatter plot.
Module 05 - Introduction to Statistics
5.1 Why do we need Statistics?
5.2 Categories of Statistics, Statistical Terminologies, Types of Data, Measures of Central Tendency, Measures of Spread
5.3 Correlation & Covariance,Standardization & Normalization,Probability & Types of Probability, Hypothesis Testing, Chi-Square testing, ANOVA, normal distribution, binary distribution.
1. Building a statistical analysis model that uses quantifications, representations, experimental data for gathering
2. Reviewing, analyzing and drawing conclusions from data.
Module 06 - Machine Learning
6.1 Introduction to Machine Learning
6.2 Introduction to Linear Regression, predictive modeling with Linear Regression, simple Linear and multiple Linear Regression, concepts and formulas, assumptions and residual diagnostics in Linear Regression, building simple linear model
6.3 Predicting results and finding p-value, introduction to logistic regression
6.4 Comparing linear regression and logistics regression, bivariate & multi-variate logistic regression
6.5 Confusion matrix & accuracy of model, threshold evaluation with ROCR, Linear Regression concepts and detailed formulas, various assumptions of Linear Regression,residuals, qqnorm(), qqline(), understanding the fit of the model, building simple linear model, predicting results and finding p-value
6.6 understanding the summary results with Null Hypothesis, p-value & F-statistic,
building linear models with multiple independent variables.
1. Modeling the relationship within the data using linear predictor functions.
2. Implementing Linear & Logistics Regression in R by building model with ‘tenure’ as dependent variable and multiple independent variables.
Module 07 - Logistic Regression
7.1 Introduction to Logistic Regression
7.2 Logistic Regression Concepts, Linear vs Logistic regression, math behind Logistic Regression
7.3 Detailed formulas, logit function and odds, Bi-variate logistic Regression, Poisson Regression
7.4 Building simple “binomial” model and predicting result, confusion matrix and Accuracy, true positive rate, false positive rate, and confusion matrix for evaluating built model, threshold evaluation with ROCR
7.5 Finding the right threshold by building the ROC plot, cross validation & multivariate logistic regression, building logistic models with multiple independent variables
7.6 Real-life applications of Logistic Regression.
1. Implementing predictive analytics by describing the data
2. explaining the relationship between one dependent binary variable and one or more binary variables.
3. You will use glm() to build a model and use ‘Churn’ as the dependent variable.
Module 08 - Decision Trees & Random Forest
8.1 What is classification and different classification techniques
8.2 Introduction to Decision Tree
8.3 Algorithm for decision tree induction, building a decision tree in R
8.4 Creating a perfect Decision Tree, Confusion Matrix, Regression trees vs Classification trees
8.5 Introduction to ensemble of trees and bagging
8.6 Random Forest concept, implementing Random Forest in R
8.7 what is Naive Bayes, Computing Probabilities, Impurity Function – Entropy, understand the concept of information gain for right split of node
8.8 Impurity Function – Information gain, understand the concept of Gini index for right split of node
8.9 Impurity Function – Gini index, understand the concept of Entropy for right split of node, overfitting & pruning, pre-pruning, post-pruning, cost-complexity pruning, pruning decision tree and predicting values, find the right no of trees and evaluate performance metrics.
1. Implementing Random Forest for both regression and classification problems.
2. You will build a tree, prune it by using ‘churn’ as the dependent variable and build a Random Forest with the right number of trees,
3. using ROCR for performance metrics.
Module 09 - Unsupervised learning
9.1 What is Clustering & it’s Use Cases, what is K-means Clustering
9.2 What is Canopy Clustering
9.3 What is Hierarchical Clustering
9.4 Introduction to Unsupervised Learning
9.5 Feature extraction & clustering algorithms, k-means clustering algorithm
9.6 Theoretical aspects of k-means, and k-means process flow, K-means in R, implementing K-means on the data-set and finding the right no. of clusters using Scree-plot
9.7 Hierarchical clustering & Dendogram, understand Hierarchical clustering, implement it in R and have a look at Dendograms
9.8 Principal Component Analysis, explanation of Principal Component Analysis in detail, PCA in R, implementing PCA in R.
1. Deploying unsupervised learning with R to achieve clustering and dimensionality reduction
2. K-means clustering for visualizing and interpreting results for the customer churn data.
Module 10 - Association Rule Mining & Recommendation Engine
10.1 Introduction to association rule Mining & Market Basket Analysis
10.2 Measures of Association Rule Mining: Support, Confidence, Lift, Apriori algorithm & implementing it in R
10.3 Introduction to Recommendation Engine
10.4 User-based collaborative filtering & Item-Based Collaborative Filtering, implementing Recommendation Engine in R, user-Based and item-Based
10.5 Recommendation Use-cases
1. Deploying association analysis as a rule-based machine learning method,
2. Identifying strong rules discovered in databases with measures based on interesting discoveries.
Module 11 - Introduction to Artificial Intelligence
11.1 Introducing Artificial Intelligence and Deep Learning
11.2 what is an Artificial Neural Network, TensorFlow – computational framework for building AI models
11.3 Fundamentals of building ANN using TensorFlow, working with TensorFlow in R.
Module 12 - Time Series Analysis
12.1 What is Time Series
12.2 Techniques and applications, components of Time Series, moving average, smoothing techniques, exponential smoothing
12.3 Univariate time series models, multivariate time series analysis
12.4 Arima model
12.5 Time Series in R, sentiment analysis in R (Twitter sentiment analysis), text analysis.
1. Analyzing time series data
2. Sequence of measurements that follow a non-random order to identify the nature of phenomenon and to forecast the future values in the series.
Module 13 - Support Vector Machine - (SVM)
13.1 Introduction to Support Vector Machine (SVM)
13.2 Data classification using SVM
13.3 SVM Algorithms using Separable and Inseparable cases
13.4 Linear SVM for identifying margin hyperplane.
Module 14 - Naïve Bayes
14.1 What is Bayes theorem
14.2 What is Naïve Bayes Classifier
14.3 Classification Workflow
14.4 How Naive Bayes classifier works, Classifier building in Scikit-learn
14.5 Building a probabilistic classification model using Naïve Bayes, Zero Probability Problem.
Module 15 - Text Mining
15.1 Introduction to concepts of Text Mining
15.2 Text Mining use cases, understanding and manipulating text with ‘tm’ & ‘stringR’
15.3 Text Mining Algorithms, Quantification of Text
15.4 Term Frequency-Inverse Document Frequency (TF-IDF), After TF-IDF.
01 – The Market Basket Analysis (MBA) case study
1.1 This case study is associated with the modeling technique of Market Basket Analysis where you will learn about loading of data, various techniques for plotting the items and running the algorithms.
1.2 It includes finding out what are the items that go hand in hand and hence can be clubbed together.
1.3 This is used for various real world scenarios like a supermarket shopping cart and so on.
02 – Logistic Regression Case Study
2.1 In this case study you will get a detailed understanding of the advertisement spends of a company that will help to drive more sales
2.2 You will deploy logistic regression to forecast the future trends
2.3 Detect patterns, uncover insights and more all through the power of R programming.
2.4 Due to this the future advertisement spends can be decided and optimized for higher revenues.
03 – Multiple Regression Case Study
3.1 You will understand how to compare the miles per gallon (MPG) of a car based on the various parameters.
3.2 You will deploy multiple regression and note down the MPG for car make, model, speed, load conditions, etc.
3.3 It includes the model building, model diagnostic, checking the ROC curve, among other things.
04 – Receiver Operating Characteristic (ROC) case study
4.1 You will work with various data sets in R,
4.2 Deploy data exploration methodologies,
4.3 Build scalable models
4.4 Predict the outcome with highest precision, diagnose the model that you have created with various real world data, check the ROC curve and more.
What projects I will be working in this Data Science certification course?
Project 01 – Market Basket Analysis
Domain – Inventory Management
Problem Statement – As a new manager in the company, you are assigned the task of increasing cross selling
Topics – Association Rule Mining, Data Extraction, Data Manipulation
- Performing association rule mining
- Understanding where to implement Apriori Algorithm
- Setting association rules with respect to confidence
Project 02 – Credit Card Fraud Detection
Domain – Banking
Problem Statement – Analysis of probability of being involved in a fraudulent operation
Topics – Algorithms, V17 Predictor, Data Visualization, R Language
- Understanding working with the credit card dataset
- Performing data analysis on various labels in the data
- Making use of V17 as predictor and using V14 for analysis
- Plotting score performance with respect to variables
Project 03 – Data Cleaning using Census Dataset
Domain – Government
Problem Statement – Performing Data Cleansing operation on a raw dataset
Topics – Data Analysis, Data preprocessing, Cleaning Ops, Data Visualization, R Language
- Understanding working with the census dataset
- Changing around various with respect to a label to perform analysis
- Creation of functions to eliminate values which are not required
- Verifying the completion of data cleansing operation
Project 04 – Loan Approval Prediction
Problem Statement – Prediction of approval rate of a loan by using multiple labels
Topics – Data Analysis, Data preprocessing, Cleaning Ops, Data Visualization, R Language
- Performing Data Preprocessing
- Building a model and applying PCA
- Building a Naïve Bayes model on the training dataset
- Prediction of values after performing analysis
Project 05 – Book Recommendation System
Domain – E-Commerce
Problem Statement – Creating a model, which can recommend books, based on user interest
Topics – Data Cleaning, Data Visualization, User Based Collaborative Filtering
- Finding the most popular books using various techniques
- Creating a Book Recommender model using User Based Collaborative Filtering
Project 06 – Netflix Recommendation System
Domain – E-Commerce
Problem Statement Simulating the Netflix Recommendation System
Topics – Data Cleaning, Data Visualization, Distribution, Recommender Lab
- Working with raw data
- Using the Recommender Lab library in R
- Making use of real data from Netflix
Project 07 – Creating a Pokemon Game using Machine Learning
Domain – Gaming
Problem Statement – Creating a game engine for Pokemon using Machine Learning
Topics – Decision Tress, Regression, Data Cleaning, Data Visualization
- Predicting which Pokemon will win based Attack vs Defense
- Finding whether a Pokemon is legendary using Decision Trees
- Understanding the dynamics of decision making in Machine Learning
Case Study 01 – Introduction to R Programming
Problem Statement – Working with various operators in R
Topics – Arithmetic Operators, Relational Operators, Logical Operators
- Working with Arithmetic Operators
- Working with Relational Operators
- Working with Logical Operators
Case Study 02 – Solving Customer Churn using Data Exploration
Problem Statement – Understanding what to do to reduce customer churn using Data Exploration
Topics – Data Exploration
- Extracting Individual columns
- Creating and applying filters to manipulate data
- Using loops for redundant operations
Case Study 03 – Creating Data Structures in R
Problem Statement – Implementing various Data Structures in R for various scenarios
Topics – Vectors, list, Matrix, Array
- Creating and Implementing Vectors
- Understanding Lists
- Using Arrays to store Matrices
- Creating and implementing Matrices
Case Study 04 – Implementing SVD in R
Problem Statement – Understanding the use Single Value Decomposition in R by making use of the MovieLense Dataset
Topics – 5-fold cross validation, Real Rating Matrix
- Creating a custom recommended movie set for each user
- Creating User Based Collaborative Filtering Model
- Creating RealRatingMatrix for Movie recommendation
Case Study 05 – Time Series Analysis
Problem Statement – Performing TSA and understanding concepts of ARIMA for a given scenario
Topics – Time Series Analysis, R Language, Data Visualization, ARIMA model
- Understand how to fit an ARIMA model
- Plotting PACF charts and finding optimal parameters
- Building the ARIMA model
- Prediction of values after performing analysis