Browse
University Logo
Electronics & ICT Academy IIT Guwahati

Certification in Big Data Analytics

This Certification Program in collaboration with E&ICT, IIT, Guwahati, aims to provide extensive training on Big Data Analytics concepts such as Hadoop, Spark, Python, MongoDB, Data Warehousing, and more. This program warrants to provide a complete experience to learners in terms of understanding the concepts, mastering them thoroughly, and applying them in real life.

Upskill for your dream job!

Sep 27, 2020 Next Batch
9 Months 400+ Hrs Live Classes
3 Guaranteed Interviews
500+ Hiring Partners
Executive Alumni status
24/7 Lifetime Support

Expert Mentors

Mentorship from Professors at E&ICT IIT, Guwahati

Be Future Ready

The Data Analytics sector is expected to grow to a 16 billion dollar industry by 2025
- NASSCOM

Get Better Pay

A Data Analyst in the US can get an average salary of US$125,000 per year
- Indeed

Who is this program for?

  • Anyone with a bachelor’s degree and passion for Big Data Analytics
  • Professionals looking to grow their career in Data Analytics, Data Science
  • Analysts & Software Engineers with a bachelor’s degree looking to transition into Data Analytics / Data Engineering
  • Project Managers / Product Managers looking to up-skill on Data Analytics & Data Engineering skills

Skills to Master

  • Big Data
  • Hadoop
  • Spark
  • Statistics
  • Data Science
  • Machine Learning
  • SQL
  • Python
  • Scala
  • Real time Streaming
  • Tableau
  • Data Mining
  • Business Intelligence
View More

About the Program

This certification program in Big Data Analytics will provide you academic rigor along with Industry exposure. The course is designed and created under the mentorship of top faculties of IIT Guwahati.

PARTNERING WITH E&ICT, IIT GUWAHATI

This Certification Program in Big Data Analytics is in partnership with E&ICT Academy IIT Guwahati. E&ICT IIT Guwahati is initiative of Meity (Ministry of Electronics and Information Technology, Govt. of India) and formed with the team of IIT Guwahati professors to provide high quality education programs to working professionals.

Upon completion of this program, you will:

  • Receive joint certificate of E&ICT, IIT Guwahati & Intellipaat
  • Alumni status of E&ICT, IIT Guwahati

PROGRAM IN COLLABORATION WITH IBM

IBM is one of the leading innovators and the biggest player in creating innovative tools for big data analytical tools. Top subject matter experts from IBM will share knowledge in the domain of analytics and big data through this training program that will help you gain breadth of knowledge and Industry experience.

Benefits for students from IBM

  • Industry recognized IBM certificates
  • Access to IBM Watson for hands-on training and practice
  • Industry in-line case studies and project work

Our Career Services

24/7 Support
Mock Interviews & Resume Preparation
Industry-grade Projects
Minimum 3 Exclusive Interviews with 200+ Hiring Partners

Program Curriculum

The core objective of this course is to get a comprehensive understanding of large volumes of data, including structured, unstructured, text, social media, video, audio, image, bot, and device log data and mastering technologies used to store, manipulate, analyse, and derive insights using statistics, Machine Learning algorithms, and Big Data tools.

Big Data Programming Prerequisites: (Self-paced Course)

  • Java programming for MapReduce
  • SQL fundamentals
  • Linux fundamentals
  • Introduction to NoSQL Databases
  • Introduction to NoSQL and MongoDB
  • MongoDB installation
  • Importance of NoSQL
  • CRUD operations
  • Data modeling and schema design
  • Data management and administration
  • Data indexing and aggregation
  • MongoDB security
  • Working with unstructured data
  • Introduction to statistics
  • Logistic regression
  • Decision trees and random forest
  • Data Analytics in Excel
    • Concepts of finance
    • Concepts of economics
    • Hands-on: Inferential statistics, descriptive statistics, simple and multivariate regression, and confidence intervals
  • Data Analytics Using SQL
    • Introduction to MySQL
    • Working with MySQL and MySQL IDE: Installation and setup
    • Introduction to SQL queries: DDL queries (create and select) and DML queries (alter, insert, etc.)
    • Working with joins, group, and filter
    • Writing complex SQL queries for data retrieval and the import and export of data and database tables
  • Introduction to Python
  • Python basic constructs
  • OOPs in Python
  • NumPy for mathematical computing
  • SciPy for scientific computing
  • Data manipulation
  • Data visualization with Matplotlib
  • Implementing statistical algorithms using Python
  • Hadoop installation and setup
  • Introduction to Big Data and Hadoop
  • Understanding HDFS and MapReduce
  • Deep dive into MapReduce
    • Introduction to Hive
    • Advanced Hive and Impala
    • Introduction to Pig
    • Flume and Sqoop
  • Scala programming
  • Spark framework
  • RDD in Spark
  • DataFrames and Spark SQL
  • Machine Learning using Spark (MLlib)
  • Introduction to PySpark
  • Who uses PySpark?
  • Why Python for Spark?
  • Values, Types, Variables
  • Operands and Expressions
  • Conditional Statements
  • Loops
  • Numbers
  • Python files I/O Functions
  • Strings and associated operations
  • Sets and associated operations
  • Lists and associated operations
  • Tuples and associated operations
  • Dictionaries and associated operations

Hands-On:

  • Demonstrating Loops and Conditional Statements
  • Tuple – related operations, properties, list, etc.
  • List – operations, related properties
  • Set – properties, associated operations
  • Dictionary – operations, related properties
  • Functions
  • Lambda Functions
  • Global Variables, its Scope, and Returning Values
  • Standard Libraries
  • Object-Oriented Concepts
  • Modules Used in Python
  • The Import Statements
  • Module Search Path
  • Package Installation Ways

Hands-On:

  • Lambda – Features, Options, Syntax, Compared with the Functions
  • Functions – Syntax, Return Values, Arguments, and Keyword Arguments
  • Errors and Exceptions – Issue Types, Remediation
  • Packages and Modules – Import Options, Modules, sys Path
  • Why model tuning?
  • What is model tuning?
  • What are parameters
  • What are Hyper-parameters
  • What is Hyper-parameter tuning?
  • Types of Hyper parameter tuning:
  • Grid Search
  • Random Search

Hands-On:

  • Performing Grid Search Hyperparameter Tuning to Increase model accuracy
  • Performing Random Search Hyperparameter Tuning to Increase model accuracy
  • Why Ensemble Learning?
  • What is Ensemble Learning?
  • Model Error
  • Bias
  • Variance
  • Reducing Model Error
  • Different Types of Ensemble Learning
  • Bagging
  • Boosting
  • Stacking

Hands-On:

  • Creating a Bagging classifier to reduce model error using sklearn
  • Creating a Boosting classifier to reduce model error using sklearn
  • Creating a Stacking classifier to reduce model error using sklearn
  • What is Model Deployment
  • Model Deployment Strategy
  • Steps in Model Deployment
  • Create a model
  • Save it
  • Load in in a web server/ web api
  • Make Predictions

Hands-On:

  • Creating, Saving and Deploying a model using a python falsk web api
  • Spark Components & its Architecture
  • Spark Deployment Modes
  • Spark Web UI
  • Introduction to PySpark Shell
  • Submitting PySpark Job
  • Writing your first PySpark Job Using Jupyter Notebook
  • What is Spark RDDs?
  • Stopgaps in existing computing methodologies
  • How RDD solve the problem?
  • What are the ways to create RDD in PySpark?
  • RDD persistence and caching
  • General operations: Transformation, Actions, and Functions
  • Concept of Key-Value pair in RDDs
  • Other pair, two pair RDDs
  • RDD Lineage
  • RDD Persistence
  • WordCount Program Using RDD Concepts
  • RDD Partitioning & How it Helps Achieve Parallelization
  • Passing Functions to Spark

Hands-On:

  • Building and Running Spark Application
  • Spark Application Web UI
  • Loading data in RDDs
  • Saving data through RDDs
  • RDD Transformations
  • RDD Actions and Functions
  • RDD Partitions
  • WordCount program using RDD’s in Python
  • Need for Spark SQL
  • What is Spark SQL
  • Spark SQL Architecture
  • SQL Context in Spark SQL
  • User-Defined Functions
  • Data Frames
  • Interoperating with RDDs
  • Loading Data through Different Sources
  • Performance Tuning
  • Spark-Hive Integration

Hands-On:

  • Spark SQL – Creating data frames
  • Loading and transforming data through different sources
  • Spark-Hive Integration
  • Introduction to Spark Streaming
  • Features of Spark Streaming
  • Spark Streaming Workflow
  • StreamingContext Initializing
  • Discretized Streams (DStreams)
  • Input DStreams, Receivers
  • Transformations on DStreams
  • DStreams Output Operations
  • Describe Windowed Operators and Why it is Useful
  • Stateful Operators
  • Vital Windowed Operators
  • Twitter Sentiment Analysis
  • Streaming using Netcat server
  • WordCount program using Kafka-Spark Streaming

Hands-On:

  • Twitter Sentiment Analysis
  • Streaming using Netcat server
  • WordCount program using Kafka-Spark Streaming
  • Spark-flume Integration
  • Introduction to Machine Learning- What, Why and Where?
  • Use Case
  • Types of Machine Learning Techniques
  • Why use Machine Learning for Spark?
  • Applications of Machine Learning (general)
  • Applications of Machine Learning with Spark
  • Introduction to MLlib
  • Features of MLlib and MLlib Tools
  • Various ML algorithms supported by MLlib
  • Supervised Learning Algorithms
  • Unsupervised Learning Algorithms
  • ML workflow utilities

Hands-On:

  • K- Means Clustering
  • Linear Regression
  • Logistic Regression
  • Decision Tree
  • Random Forest
  • Apache Flume and Apache Kafka
  • Spark Streaming
  • Case Study: Spark vs Kafka and when to use them
  • Creation of multi-node cluster setup using Amazon EC2
  • Hadoop Administration: Cluster configuration
  • Hadoop Administration: Maintenance, monitoring, and troubleshooting
  • Implementing security using Kerberos
  • Maintenance, monitoring, alerting, and troubleshooting Big Data solutions
  • What is data warehousing? What is data mining? Use cases and applications
  • Creating data models for large data warehouses
  • Different types of data models: Star, snowflake, and hybrid; which is the right model?
  • Integration of Hadoop and Spark with an ETL tool
  • Building workflows using Informatica for the integration with HDFS, Hive, MapReduce, etc.
  • Performance Tuning of ETL systems for processing large datasets
  • Introduction to data visualization and the power of Tableau
  • Architecture of Tableau
  • Working with metadata and data blending
  • Creation of sets
  • Working with filters
  • Organizing data and visual analytics
  • Working with mapping
  • Working with calculations and expressions
  • Working with parameters
  • Charts and graphs
  • Dashboards and stories
  • Tableau Prep
  • Integration of Tableau with Big Data tools like Hadoop and Spark
  • Marketing, Web, and Social Media Analytics
  • Fraud and Risk Analytics
  • Supply Chain and Logistics Analytics
  • HR Analytics
View More
400+
Hours of Instructor-led Training
100
Hours of Self-paced Videos
7
Guided Projects to Practice
24/7
Lifetime Technical Support

Projects Covered

Twitter Sentiment Analysis

This project involves analyzing the tweets of people by looking at the key phrases and words and analyzing them using the dictionary and the value attributed to them based on the sentiment that they are trying to convey on Twitter.
image

Finding Top Movies Based on the MovieLens Data

This project involves writing a MapReduce program to analyze the MovieLens data and creating a list of top 10 movies, alongside using Apache Pig and Apache Hive for working with distributed datasets.
image

Connecting Pentaho with the Hadoop Ecosystem

This project lets you connect Pentaho with the Hadoop ecosystem as Pentaho works well with HDFS, HBase, Oozie, and ZooKeeper. You will connect the Hadoop cluster with Pentaho Data Integration, Pentaho Analytics, Pentaho Server, and Pentaho Report Designer.
image

Course Advisor

Suresh Paritala

Suresh Paritala

Solutions Architect at Microsoft, USA

A Senior Software Architect at NextGen Healthcare who has previously worked with IBM Corporation, Suresh Paritala has worked on Big Data, Data Science, Advanced Analytics, Internet of Things and Azure, along with AI domains like Machine Learning and Deep Learning. He has successfully implemented high-impact.

David Callaghan

David Callaghan

Big Data Solutions Architect, USA

An experienced Blockchain Professional who has been bringing integrated Blockchain, particularly Hyperledger and Ethereum, and Big Data solutions to the cloud, David Callaghan has previously worked on Hadoop, AWS Cloud, Big Data and Pentaho projects that have had major impact on revenues of marquee brands around the world.

Muthusamy Manigandan

Muthusamy Manigandan

Head Engg., Amazon India

Mani comes with great experience on Algorithms, Data Science, Big Data, AI. Have worked on multiple research projects in the past on Data Science, AI, ML for Display Advertising, Recommendation and Classification systems. He comes with more than 16 yrs experience with building large scale AI products with top MNC’s

Reviews

John Chioles

Dileep & Ajay

Mr. yoga

Admission Details

The application process consists of three simple steps. An offer of admission will be made to selected candidates based on the feedback from the interview panel. The selected candidates will be notified over email and phone, and they can block their seats through the payment of the admission fee.

Submit Application

Tell us a bit about yourself and why you want to join this program

Application Review

An admission panel will shortlist candidates based on their application

Admission

Selected candidates will be notified within 1–2 weeks

Program Fee
$2,983
Get a chance to win a scholarship up to USD1,000/-

I’m Interested to Enroll
Learn from best-in-class content created and delievered by leading faculty and industry leaders.

Frequently Asked Questions

This program is conducted online for 9 months with the help of multiple live instructor-led training sessions.

After you share your basic details with us, our course advisor will speak to you and based on the discussion, your application will be screened. If your application is shortlisted, you will need to fill in a detailed application form and attend a telephonic interview, which will be conducted by a subject matter expert. Based on your profile and interview, if you are selected, you will receive an admission offer letter.

To complete this program, it requires 9 months of attending live classes and completing the assignments and projects, along the way.

If by any circumstance you miss a live class, you will be given the recording of the class within the next 12 hours. Also, if you need any support, you will have access to our 24/7 technical support team for any sort of query resolution.

To complete this program, you will have to spare around 6 hours a week in learning. Classes will be held over weekends (Sat/Sun), and each session will be of 3 hours.

To ensure that you make the most of this program, you will be given industry-grade projects to work on. This is done to make sure that you get a concrete understanding of what you’ve learned.

Upon the completion of this program, you will be first preparing for job interviews through mock interview sessions, and then you will get assistance in preparing a resume that fulfils industry standards. This will be followed by a minimum of 3 exclusive interviews with 200+ hiring partners across the globe.

Upon the completion of all of the requirements of the program, you will be awarded a certificate from E&ICT Academy IIT, Guwahati.

Talk To Us

How You Benefit From
This Program

  • Non-biased career guidance
  • Counselling based on your skills and preference
  • No repetitive calls, only as per convenience
  • Rigorous curriculum designed by industry experts
  • Complete this program while you work

I’m Interested in This Program

Select Currency