What is Big Data, Where does Hadoop fit in, Hadoop Distributed File System – Replications, Block Size, Secondary Namenode, High Availability, Understanding YARN – ResourceManager, NodeManager, Difference between 1.x and 2.x
Hadoop 2.x Cluster Architecture , Federation and High Availability, A Typical Production Cluster setup , Hadoop Cluster Modes, Common Hadoop Shell Commands, Hadoop 2.x Configuration Files, Cloudera Single node cluster
How Mapreduce Works, How Reducer works, How Driver works, Combiners, Partitioners, Input Formats, Output Formats, Shuffle and Sort, Mapside Joins, Reduce Side Joins, MRUnit, Distributed Cache
Working with HDFS, Writing WordCount Program, Writing custom partitioner, Mapreduce with Combiner , Map Side Join, Reduce Side Joins, Unit Testing Mapreduce, Running Mapreduce in Local Job Runner Mode
What is Graph, Graph Representation, Breadth first Search Algorithm, Graph Representation of Map Reduce, How to do the Graph Algorithm, Example of Graph Map Reduce,
Exercise 1: Exercise 2:Exercise 3:
A. Introduction to Pig
Understanding Apache Pig, the features, various uses and learning to interact with Pig
B. Deploying Pig for data analysis
The syntax of Pig Latin, the various definitions, data sort and filter, data types, deploying Pig for ETL, data loading, schema viewing, field definitions, functions commonly used.
C. Pig for complex data processing
Various data types including nested and complex, processing data with Pig, grouped data iteration, practical exercise
D. Performing multi-dataset operations
Data set joining, data set splitting, various methods for data set combining, set operations, hands-on exercise
E. Extending Pig
Understanding user defined functions, performing data processing with other languages, imports and macros, using streaming and UDFs to extend Pig, practical exercises
F. Pig Jobs
Working with real data sets involving Walmart and Electronic Arts as case study
A. Hive Introduction
Understanding Hive, traditional database comparison with Hive, Pig and Hive comparison, storing data in Hive and Hive schema, Hive interaction and various use cases of Hive
B. Hive for relational data analysis
Understanding HiveQL, basic syntax, the various tables and databases, data types, data set joining, various built-in functions, deploying Hive queries on scripts, shell and Hue.
C. Data management with Hive
The various databases, creation of databases, data formats in Hive, data modeling, Hive-managed Tables, self-managed Tables, data loading, changing databases and Tables, query simplification with Views, result storing of queries, data access control, managing data with Hive, Hive Metastore and Thrift server.
D. Optimization of Hive
Learning performance of query, data indexing, partitioning and bucketing
E. Extending Hive
Deploying user defined functions for extending Hive
F. Hands on Exercises – working with large data sets and extensive querying
Deploying Hive for huge volumes of data sets and large amounts of querying
G. UDF, query optimization
Working extensively with User Defined Queries, learning how to optimize queries, various methods to do performance tuning.
A. Introduction to Impala
What is Impala?, How Impala Differs from Hive and Pig, How Impala Differs from Relational Databases, Limitations and Future Directions, Using the Impala Shell
B. Choosing the Best (Hive, Pig, Impala)
C. Modeling and Managing Data with Impala and Hive
Data Storage Overview, Creating Databases and Tables, Loading Data into Tables, HCatalog, Impala Metadata Caching
D. Data Partitioning
Partitioning Overview, Partitioning in Impala and Hive
Selecting a File Format, Tool Support for File Formats, Avro Schemas, Using Avro with Hive and Sqoop, Avro Schema Evolution, Compression
What is Hbase, Where does it fits, What is NOSQL
Multi Node Cluster Setup using Amazon ec2 – Creating 4 node cluster setup, Running Map Reduce Jobs on Cluster
How ETL tools work in Big data Industry, Connecting to HDFS from ETL tool and moving data from Local system to HDFS, Moving Data from DBMS to HDFS, Working with Hive with ETL Tool, Creating Map Reduce job in ETL tool, End to End ETL PoC showing big data integration with ETL tool.
Major Project, Hadoop Development, cloudera Certification Tips and Guidance and Mock Interview Preparation, Practical Development Tips and Techniques, certification preparation.
Project 1 – Working with MapReduce, Hive, Sqoop
Problem Statement – It describes that how to import mysql data using sqoop and querying it using hive and also describes that how to run the word count mapreduce job.
Project 2 – Connecting Pentaho with Hadoop Eco-system
Problem Statement – It includes:
Topics: Quick Overview of ETL and BI, Configuring Pentaho to work with Hadoop Distribution, Loading data into Hadoop cluster, Transforming data into Hadoop cluster, Extracting data from Hadoop Cluster
Hadoop is a leader in Hadoop online training. This Hadoop analyst training will help you be fully proficient in becoming a master data analyst in order to collect, analyze and transform huge volumes of data on the Hadoop cluster setup by deploying powerful tools like SQL and other scripting languages. Upon successful completion of the training you will be awarded the Intellipaat Hadoop Analyst certification.
Intellipaat offers lifetime access to videos, course materials, 24/7 Support, and course material upgrades to latest version at no extra fees. For Hadoop and Spark training you get the Intellipaat Proprietary Virtual Machine for Lifetime and free cloud access for 6 months for performing training exercises. Hence it is clearly a one-time investment. We are also exclusively partnered with IBM for providing you IBM Certified Hadoop Professional training as well.
Intellipaat basically offers the self-paced training and online instructor-led training. Apart from that we also provide corporate training for enterprises. All our trainers come with over 12 years of industry experience in relevant technologies and also they are subject matter experts working as consultants. You can check about the quality of our trainers in the sample videos provided.
If you have any queries you can contact our 24/7 dedicated support to raise a ticket. We provide you email support and solution to your queries. If the query is not resolved by email we can arrange for a one-on-one session with our trainers. The best part is that you can contact Intellipaat even after completion of training to get support and assistance. There is also no limit on the number of queries you can raise when it comes to doubt clearance and query resolution.
Yes, you can learn Hadoop without being from a software background. We provide complimentary courses in Java and Linux so that you can brush up on your programming skills. This will help you in learning Hadoop technologies better and faster.
The Intellipaat self-paced training is for people who want to learn at their own leisurely pace. As part of this program we provide you with one-on-one sessions, doubt clearance over email, 24/7 Live Support, 1yr of cloud access and lifetime LMS and upgrade to the latest version at no extra cost. The prices of self-paced training can be 75% lesser than online training. While studying should you face any unexpected challenges then we shall arrange a Virtual LIVE session with the trainer.
We provide you with the opportunity to work on real world projects wherein you can apply your knowledge and skills that you acquired through our training. We have multiple projects that thoroughly test your skills and knowledge of various Hadoop components making you perfectly industry-ready. These projects could be in exciting and challenging fields like banking, insurance, retail, social networking, high technology and so on. The Intellipaat projects are equivalent to six months of relevant experience in the corporate world.
Yes, Intellipaat does provide you with placement assistance. We have tie-ups with 80+ organizations including Ericsson, Cisco, Cognizant, TCS, among others that are looking for Hadoop professionals and we would be happy to assist you with the process of preparing yourself for the interview and the job.
Yes, if you would want to upgrade from the self-paced training to instructor-led training then you can easily do so by paying the difference of the fees amount and joining the next batch of classes which shall be separately notified to you.
Upon successful completion of training you have to take a set of quizzes, complete the projects and upon review and on scoring over 60% marks in the qualifying quiz the official Intellipaat verified certificate is awarded.The Intellipaat Certification is a seal of approval and is highly recognized in 80+ corporations around the world including many in the Fortune 500 list of companies.
This course is designed for clearing the Intellipaat Hadoop Analyst exam.
As part of this training you will be working on real time projects and assignments that have immense implications in the real world industry scenario thus helping you fast track your career effortlessly.
At the end of this training program there will be a quiz that perfectly reflects the type of questions asked in the certification exam and helps you score better marks.
The certification will be awarded on the completion of assignments and Project work (upon expert review) and on scoring of at least 60% marks in the quiz. Intellipaat certification is well recognized in top 80+ MNCs like Ericsson, Cisco, Cognizant, Sony, Mu Sigma, Saint-Gobain, Standard Chartered, TCS, Genpact, Hexaware, etc.
"PMI®", "PMP®" and "PMI-ACP®" are registered marks of the Project Management Institute, Inc.
The Open Group®, TOGAF® are trademarks of The Open Group.
The Swirl logoTM is a trade mark of AXELOS Limited.
ITIL® is a registered trade mark of AXELOS Limited.
PRINCE2® is a Registered Trade Mark of AXELOS Limited.
Certified ScrumMaster® (CSM) and Certified Scrum Trainer® (CST) are registered trademarks of SCRUM ALLIANCE®
Professional Scrum Master is a registered trademark of Scrum.org