Installation of Hadoop components and ecosystems: Hive, Sqoop, Pig, Scala and Spark
Introduction to Big Data and Hadoop and its ecosystem, MapReduce: the importance of Big Data, how does Hadoop fit into the framework, Hadoop Distributed File System (HDFS):replications, block size, secondary Name node, high availability and YARN: resource manager, node manager
How does MapReduce work, how does Reducer work, how does Driver work, combiners, partitioners, input formats, output formats, shuffle and sort
How to create a Hadoop cluster with four nodes, working with cluster and deploying a MapReduce job, how to write a MapReduce code and setting up the Cloudera Manager
The significance of the configuration files, overview of the configuration values and parameters, the parameters of Hadoop distributed file system, setting up the Hadoop environment, detailed configuration files like ‘Include’ and ‘Exclude’, the directory structure and files of Name node and Data node anded it log and file system image for Hadoop administration and maintenance
Deploying the checkpoint procedure, working with metadata, data backup, safe mode, Name node failure and recovery procedure, troubleshooting to resolve various problems, knowing what to look for, node removal and more, the best practices in using the JMX tool for cluster monitoring, working with stack traces, using logs to monitor and troubleshoot, deploying various open-source tools for cluster monitoring, how to deploy the Job Scheduler, the process of job submission flow in MapReduce, scheduling of jobs on the same cluster, FIFO scheduling and Fair Scheduler configuration
Hadoop advanced administration, Quorum Journal Manager, HDFS security and configuring Hadoop federation, Hadoop platform security fundamentals, the process to secure the Hadoop platform, the importance of Kerberos, integrating with the Hadoop platform and Hadoop cluster configuration with Kerberos
Project 1 : Streaming Twitter Data Using Flume
Topics:This project is associated with giving you hands-on experience in deploying Apache Flume for extracting Twitter streaming data and getting it into Hadoop for analysis. You will learn to handle high volumes data spikes, horizontal data scaling to accommodate increased data volumes and data delivery guarantee.
Project 2 : Hive and Impala Comparison
Topics: Installation of CDH5 Apache Hive and Apache Impala, comparing the two tools for data querying, the advantages of Hive as a data warehouse for summarization and analysis and the advantage of Impala as a massively parallel processing and SQL like querying engine for high speed querying of data in HDFS
Intellipaat is the pioneer in Hadoop training. In this Hadoop Administration training, you will master the concepts of managing, monitoring and troubleshooting large Hadoop clusters and deploying various components on the cluster like HDFS, MapReduce and HBase. You will also learn to add new users, authenticate the users and secure the cluster in a foolproof manner. This training course is fully aligned with clearing the Cloudera CCA Administrator Exam (CCA131).
Intellipaat offers lifetime access to videos, course materials, 24/7 support, and course material upgrades to the latest version at no extra fees. For Hadoop Admin training you get the Intellipaat Proprietary Virtual Machine for lifetime and free cloud access for 6 months for performing training exercises. Hence, it is clearly a one-time investment. We are also exclusively partnered with IBM for providing you with IBM Certified Hadoop Professional training as well.
This course is designed for clearing the Cloudera CCA Administrator Exam (CCA131). The entire Hadoop administration course content is in line with this certification program and helps you clear it with ease and get the best jobs in the top MNCs. As part of this Hadoop Admin training you will be working on real-time projects and assignments that have immense implications in the real-world industry scenarios, thus helping you fast track your career effortlessly.
At the end of this Hadoop administration training program, there will be quizzes that perfectly reflect the type of questions asked in the respective certification exams and help you score better marks.
Intellipaat Course Completion Certification will be awarded upon the completion of the project work (after expert review) and upon scoring at least 60% marks in the quiz. Intellipaat certification is well recognized in top 80+ MNCs like Ericsson, Cisco, Cognizant, Sony, Mu Sigma, Saint-Gobain, Standard Chartered, TCS, Genpact, Hexaware, etc.
A Senior Software Architect at NextGen Healthcare who has previously worked with IBM Corporation, Suresh Paritala has worked on Big Data, Data Science, Advanced Analytics, Internet of Things and Azure, along with AI domains like Machine Learning and Deep Learning. He has successfully implemented high-impact projects in major corporations around the world.
An experienced Blockchain Professional who has been bringing integrated Blockchain, particularly Hyperledger and Ethereum, and Big Data solutions to the cloud, David Callaghan has previously worked on Hadoop, AWS Cloud, Big Data and Pentaho projects that have had major impact on revenues of marquee brands around the world.