R language for statistical programming, the various features of R, introduction to R Studio, the statistical packages, familiarity with different data types and functions, learning to deploy them in various scenarios, use SQL to apply ‘join’ function, components of R Studio like code editor, visualization and debugging tools, learn about R-bind.
R Functions, code compilation and data in well-defined format called R-Packages, learn about R-Package structure, Package metadata and testing, CRAN (Comprehensive R Archive Network), Vector creation and variables values assignment.
R functionality, Rep Function, generating Repeats, Sorting and generating Factor Levels, Transpose and Stack Function.
Introduction to matrix and vector in R, understanding the various functions like Merge, Strsplit, Matrix manipulation, rowSums, rowMeans, colMeans, colSums, sequencing, repetition, indexing and other functions.
Understanding subscripts in plots in R, how to obtain parts of vectors, using subscripts with arrays, as logical variables, with lists, understanding how to read data from external files.
Generate plot in R, Graphs, Bar Plots, Line Plots, Histogram, components of Pie Chart.
Understanding Analysis of Variance (ANOVA) statistical technique, working with Pie Charts, Histograms, deploying ANOVA with R, one way ANOVA, two way ANOVA.
K-Means Clustering for Cluster & Affinity Analysis, Cluster Algorithm, cohesive subset of items, solving clustering issues, working with large datasets, association rule mining affinity analysis for data mining and analysis and learning co-occurrence relationships.
Introduction to Association Rule Mining, the various concepts of Association Rule Mining, various methods to predict relations between variables in large datasets, the algorithm and rules of Association Rule Mining, understanding single cardinality.
Understanding what is Simple Linear Regression, the various equations of Line, Slope, Y-Intercept Regression Line, deploying analysis using Regression, the least square criterion, interpreting the results, standard error to estimate and measure of variation.
Scatter Plots, Two variable Relationship, Simple Linear Regression analysis, Line of best fit
Deep understanding of the measure of variation, the concept of co-efficient of determination, F-Test, the test statistic with an F-distribution, advanced regression in R, prediction linear regression.
Logistic Regression Mean, Logistic Regression in R.
Advanced logistic regression, understanding how to do prediction using logistic regression, ensuring the model is accurate, understanding sensitivity and specificity, confusion matrix, what is ROC, a graphical plot illustrating binary classifier system, ROC curve in R for determining sensitivity/specificity trade-offs for a binary classifier.
Detailed understanding of ROC, area under ROC Curve, converting the variable, data set partitioning, understanding how to check for multicollinearlity, how two or more variables are highly correlated, building of model, advanced data set partitioning, interpreting of the output, predicting the output, detailed confusion matrix, deploying the Hosmer-Lemeshow test for checking whether the observed event rates match the expected event rates.
Data analysis with R, understanding the WALD test, MC Fadden’s pseudo R-squared, the significance of the area under ROC Curve, Kolmogorov Smirnov Chart which is non-parametric test of one dimensional probability distribution.
Connecting to various databases from the R environment, deploying the ODBC tables for reading the data, visualization of the performance of the algorithm using Confusion Matrix.
Creating an integrated environment for deploying R on Hadoop platform, working with R Hadoop, RMR package and R Hadoop Integrated Programming Environment, R programming for MapReduce jobs and Hadoop execution.
Logistic Regression Case Study
In this case study you will get a detailed understanding of the advertisement spends of a company that will help to drive more sales. You will deploy logistic regression to forecast the future trends, detect patterns, uncover insights and more all through the power of R programming. Due to this the future advertisement spends can be decided and optimized for higher revenues.
Multiple Regression Case Study
You will understand how to compare the miles per gallon (MPG) of a car based on the various parameters. You will deploy multiple regression and note down the MPG for car make, model, speed, load conditions, etc. It includes the model building, model diagnostic, checking the ROC curve, among other things.
Receiver Operating Characteristic (ROC) case study
You will work with various data sets in R, deploy data exploration methodologies, build scalable models, predict the outcome with highest precision, diagnose the model that you have created with various real world data, check the ROC curve and more.
Classification and Recommendation, Clustering in Mahout, Pattern Mining, Understanding machine Learning, Using Model diagram to decide the approach, Data flow, Supervised and Unsupervised learning
Concept of Recommendation, Recommendations by E-commerce site, Comparison between User Recommendations and Item recommendation, Define recommenders and Classifiers, Process of Collaborative Filtering, Explaining Pearson coefficient algorithm, Euclidean distance measure, Implementing a recommender using map reduce
Defining Clustering, User-to-user similarity, Clustering Illustration, Euclidean distance measure, Distance measure vector, Understanding the process of Clustering, Vectorizing documents-Unstructured data
Document clustering, Sequence-to-sparse Utility, K-Mean Clustering
Terminology, Predictor and Target variable, Classifiable DataKey Challenges in Classification algorithm, Vectorizing Continuous data, Classification Examples, Logic Regression and its examples
Clustering, Clustering Process, Transaction Clustering, Different techniques of Vectorization, Distance measure, Clustering algorithm-K-MEAN, Clustering Application-1, Clustering Application-2, Sentiment Analyzer
Pearson Coefficient, Collaborative Filtering Process, Collaborative Filtering, Similarity Algorithms, Pearson Correlation, Euclidean Distance Measure -Frequent Pattern & Association rules, Frequent Pattern Growth
Introduction to Data Science, Use cases, Need of Business Analytics, Data Science Life Cycle, Different tools available for Data Science
Installing R and R-Studio, R packages, R Operators, if statements and loops (for, while, repeat, break, next), switch case
Importing and Exporting data from external source, Data exploratory analysis, R Data Structure (Vector, Scalar, Matrices, Array, Data frame, List), Functions, Apply Functions
Bar Graph (Simple, Grouped, Stacked), Histogram, Pi Chart, Line Chart, Box (Whisker) Plot, Scatter Plot, Correlogram
Terminologies of Statistics ,Measures of Centers, Measures of Spread, Probability, Normal Distribution, Binary Distribution, Hypothesis Testing, Chi Square Test, ANOVA
Supervised Learning – Linear Regression ,Bivariate Regression, Multiple Regression Analysis, Correlation( Positive, negative and neutral), Industrial Case Study, Machine Learning Use-Cases, Machine Learning Process Flow, Machine Learning Categories
What is Classification and its use cases?, What is Decision Tree?, Algorithm for Decision Tree Induction, Creating a Perfect Decision Tree, Confusion Matrix
Random Forest, What is Naive Bayes?
Introduction to Base SAS, Installation of SAS tool, Getting started with SAS, various SAS Windows – Log, Explorer, Output, Search, Editor, etc. working with data sets, overview of SAS Functions, Library Types and programming files
Import/Export Raw Data files, reading and sub setting the data set, various statements like WHERE, SET, Merge
Hands-on Exercise – Import Excel file in workspace, Read data, Export the workspace to save data
Various SAS Operators – Arithmetic, Logical, Comparison, various SAS Functions – NUMERIC, CHARACTER, IS NULL, CONTAINS, LIKE, Input/Put, Date/Time, Conditional Statements (Do While, Do Until, If, Else)
Hands-on Exercise – Apply logical, arithmetic operators and SAS functions to perform operations
Understanding about Input Buffer, PDV (Backend), learning what is Missover
Defining and Using KEEP and DROP statements, apply these statements, Format and Labels in SAS.
Hands-on Exercise – Use KEEP and DROP statements
Understanding Delimiter, dataline rules, DLM, Delimiter DSD, raw data files and execution, list input for standard data.
Hands-on Exercise – Use delimiter rules on raw data files
The various SAS standard Procedures built-in for popular programs – PROC SORT, PROC FREQ, PROC SUMMARY, PROC RANK, PROC EXPORT, PROC DATASET, PROC TRANSPOSE, , PROC CORR etc.
Hands-on Exercise – Use SORT, FREQ, SUMMARY, EXPORT and other procedures
Reading standard and non-standard numeric inputs with Formatted inputs, Column Pointer Controls, Controlling while a record loads, Line pointer control / Absolute line pointer control, Single Trailing , Multiple IN and OUT statements, DATA LINES statement and rules, List Input Method, comparing Single Trailing and Double Trailing.
Hands-on Exercise – Read standard and non-standard numeric inputs with Formatted inputs, Control while a record loads, Control a Line pointer, Write Multiple IN and OUT statements
SAS FORMAT statements – standard and user-written, associating a format with a variable, working with SAS FORMAT, deploying it on PROC Data sets, comparing ATTRIB and FORMAT statements.
Hands-on Exercise – Format a variable, deploy format rule on PROC DATA set, Use ATTRIB statement
Understanding PROC GCHART, various Graphs, Bar Charts – Pie, Bar, 3D, plotting variables with PROC GPLOT.
Hands-on Exercise – Plot graphs using PROC GPLOT Display charts using PROC GCHART
SAS advanced data discovery and visualization, point-and-click analytics capabilities, powerful reporting tools.
Character Functions, Numeric Functions, Converting Variable Type.
Hands-on Exercise – Use Functions in data transformation
Introduction to ODS, Data Optimization, How to generate files (rtf, pdf, html, doc) using SAS
Hands-on Exercise – Optimize data, generate rtf, pdf, html and doc files
Macro Syntax, Macro Variables, Positional Parameters in a Macro, Macro Step
Hands-on Exercise – Write a macro, Use positional parameters
SQL Statements in SAS, SELECT, CASE, JOIN, UNION, Sorting Data
Hands-on Exercise – Create sql query to select and add a condition
Use a CASE in select query
Base SAS web-based interface and ready-to-use programs, advanced data manipulation, storage and retrieval, descriptive statistics.
Hands-on Exercise – Use web UI to do statistical operations
Report Enhancement, Global Statements, User-defined Formats, PROC SORT, ODS Destinations, ODS Listing, PROC FREQ, PROC Means, PROC UNIVARIATE, PROC REPORT, PROC PRINT
Hands-on Exercise – Use PROC SORT to sort the results, List ODS, Find mean using PROC Means, print using PROC PRINT
Domain – Restaurant Revenue Prediction
Data set – Sales
Project Description – This project involves predicting the sales of a restaurant on the basis of certain objective measurements. This project will give real time industry experience on handling multiple use cases and derive the solution. This project gives insights about feature engineering and selection.
Domain – Data Analytics
Objective – To predict about the class of a flower using its petal’s dimensions
Domain – Finance
Objective – The project aims to find the most impacting factors in preferences of pre-paid model, also identifies which are all the variables highly correlated with impacting factors
Domain – Stock Market
Objective – This project focuses on Machine Learning by creating predictive data model to predict future stock prices
Project 1 : Augmenting retail sales with Data Science
Industry : Retail
Problem Statement : How to deploy the various rules and algorithms of Data Science for analyzing stationary store purchase data.
Topics : In this project you will deploy the various tools of Data Science like association rule, Apriori algorithm in R, support, lift and confidence of association rule. You will analyze the purchase data of the stationary outlet for three days and understand the customer buying patterns across products.
Project 2 : Increasing revenues of a retail enterprise using Data Science tools
Industry : Retail
Problem Statement : How to successfully convert visitors into buyers for retail enterprise
Topics : In this Data Science project you will learn how to reduce the churn rate, recommend the right items for the customers for the ecommerce website and place the right products in the physical store. You will learn how to optimize the limited budget for marketing using the right Data Science techniques.
Project 3 : Analyzing pre-paid model of stock broking
Industry : Finance
Problem Statement: Finding out the deciding factor for people to opt for the pre-paid model of stock broking.
Topics:In this Data Science project you will learn about the various variables that are highly correlated in pre-paid brokerage model, analysis of various market opportunities, developing targeted promotion plans for various products sold under various categories. You will also do competitor analysis, the advantages and disadvantages of pre-paid model.
Project 4 : Cold Start Problem in Data Science
Problem Statement: how to build a recommender system without the historical data available
Topics: This project involves understanding of the cold start problem associated with the recommender systems. You will gain hands-on experience in information filtering, working on systems with zero historical data to refer to, as in the case of launching a new product. You will gain proficiency in working with personalized applications like movies, books, songs, news and such other recommendations. This project includes the various ways of working with algorithms and deploying other data science techniques.
Project 5 : Recommendation for Movie, Summary
Topics : This is real world project that gives you hands-on experience in working with a movie recommender system. Depending on what movies are liked by a particular user, you will be in a position to provide data-driven recommendations. This project involves understanding recommender systems, information filtering, predicting ‘rating’, learning about user ‘preference’ and so on. You will exclusively work on data related to user details, movie details and others. The main components of the project include the following:
Project 6 : Making sense of customer online buying pattern
Industry : ecommerce
Problem Statement : An ecommerce company wants to know how to deploy targeted selling to its customers
Description : This Data Science project involves learning from the customer buying habits and selling them the products that they need. As part of the project you will aggregate, cleanse, transform and load the data of customer buying history. Then you will deploy statistical analysis, predictive modeling and create profiles of customers to implement targeted selling to them.
Project 7 : Fraud Detection in Banking System
Industry : Banking and Finance
Problem Statement : A major bank wants to deploy data science to detect fraudulent activities and take remedial actions before it is too late
Description : This data science project will help you understand how you can look for fraudulent activities in a banking ecosystem. You will work with banking transactional data, look for outliers in the data, classify this data based on various parameters, apply statistics and come up with inferences to look for rogue transactions and mitigate the risk before it is too late.
The Market Basket Analysis (MBA) case study
This case study is associated with the modeling technique of Market Basket Analysis where you will learn about loading of data, various techniques for plotting the items and running the algorithms. It includes finding out what are the items that go hand in hand and hence can be clubbed together. This is used for various real world scenarios like a supermarket shopping cart and so on.
Project – Data Analysis Project
Data – Sales
Problem Statement – It includes the following actions:
Understand the business solutions, Discussion with the warehouse team, Data Collection & Storage, Data Cleaning, Build a Hypothesis Tree around the business problem, Produce the final result.
Project 1 – Build analytical solution for patients taking medicines
Domain: Health Care
Objective – This project aims to find out descriptive statistics & subset for specific clinical data problems. It will give them brief insight about BASE SAS procedures and data steps.
Project 2 – Build revenue projections reports
Objective – This project will give you hands-on experience in working with the SAS data analytics and business intelligence tool. You will be working on the data entered in a business enterprise setup, aggregate, retrieve and manage that data. You will learn to create insightful reports and graphs and come up with statistical and mathematical analysis to scientifically predict the revenue projection for a particular future time frame. Upon completion of the project you will be well-versed in the practical aspects of data analytics, predictive modeling, and data mining.
Domain: Finance Market
Objective – The project aims to find the most impacting factors in preferences of pre-paid model, also identifies which are all the variables highly correlated with impacting factors
Objective – k-Means Cluster analysis on Iris dataset to predict about the class of a flower using its petal’s dimensions
Intellipaat is leader in providing Data Science training. Become proficient in implementing sophisticated business and data analytics models using concepts of Data science, R Programming, Apache Mahout and Statistics and Probability. This training course is fully aligned towards clearing the CCP Data Scientist Cloudera certification (CCP:DS).
You will be working on real time projects that have high relevance in the corporate world, step by step assignments and curriculum designed by industry experts. Upon completion of the training course you can apply for some of the best jobs in top MNCs around the world at top salaries. Intellipaat offers lifetime access to videos, course materials, 24/7 Support, and course material upgrading to latest version at no extra fees. Hence it is clearly a one-time investment.
This course is designed for clearing the SAS Certified Base Programmer certification exam. The entire training course content is in line with respective certification program and helps you clear the requisite certification exam with ease and get the best jobs in the top MNCs.
As part of this training you will be working on real time projects and assignments that have immense implications in the real world industry scenario thus helping you fast track your career effortlessly.
At the end of this training program there will be quizzes that perfectly reflect the type of questions asked in the respective certification exams and helps you score better marks in certification exam.
Intellipaat R, Mahout, Data Science and the Intellipaat Course Completion certificate will be awarded on the completion of Project work (upon expert review) and on scoring of at least 60% marks in the quiz. Intellipaat certification is well recognized in top 80+ MNCs like Ericsson, Cisco, Cognizant, Sony, Mu Sigma, Saint-Gobain, Standard Chartered, TCS, Genpact, Hexaware, etc.
This course is designed for clearing SAS Certified Base Programmer certification, Intellipaat R, Mahout,SAS Certified Base Programmer and the Intellipaat Course Completion certificate.
At the end of the course there will be a quiz and project assignments once you complete them you will be awarded with Intellipaat Course Completion certificate.Intellipaat enjoys strong relationships with multiple staffing companies in US, UK and have +80 clients across the globe. If you are looking out for exploring job opportunities, you can pass your resumes once you complete the course and we will help you with job assistance. We don’t charge any extra fees for passing the resume to our partners and clients.
"PMI®", "PMP®" and "PMI-ACP®" are registered marks of the Project Management Institute, Inc.
The Open Group®, TOGAF® are trademarks of The Open Group.
The Swirl logoTM is a trade mark of AXELOS Limited.
ITIL® is a registered trade mark of AXELOS Limited.
PRINCE2® is a Registered Trade Mark of AXELOS Limited.
Certified ScrumMaster® (CSM) and Certified Scrum Trainer® (CST) are registered trademarks of SCRUM ALLIANCE®
Professional Scrum Master is a registered trademark of Scrum.org