Working of Talend,Introduction to Talend Open Studio and its Usability,What is Meta Data?
Creating a new Job,Concept and creation of Delimited file,Using Meta Data and its Significance,What is propagation?,Data integration schema,Creating Jobs using t-filter row and string filter,Input delimation file creation
Job design and its features,What is a T map?,Data Aggregation,Introduction to triplicate and its Working,Significance and working of tlog,T map and its properties
Extracting data from the source,Source and Target in Database (MySQL),Creating a connection, Importing Schema or Metadata
Calling and using Functions,What are Routines?,Use of XML file in Talend,Working of Format data functions,What is type casting?
Defining Context variable,Learning Parameterization in ETL,Writing an example using trow generator,Define and Implement Sorting,What is Aggregator?,Using t flow for publishing data,Running Job in a loop
Learn to start Trish Server,Connectivity of ETL tool connect with Hadoop,Define ETL method,Implementation of Hive,Data Import into Hive with an example,An example of Partitioning in hive,Reason behind no customer table overwriting?,Component of ETL,Hive vs. Pig,Data Loading using demo customer,ETL Tool,Parallel Data Execution
Big Data, Factors constituting Big Data,Hadoop and Hadoop Ecosystem,Map Reduce -Concepts of Map, Reduce, Ordering, Concurrency, Shuffle, Reducing, Concurrency ,Hadoop Distributed File System (HDFS) Concepts and its Importance,Deep Dive in Map Reduce – Execution Framework, Partitioner Combiner, Data Types, Key pairs,HDFS Deep Dive – Architecture, Data Replication, Name Node, Data Node, Data Flow, Parallel Copying with DISTCP, Hadoop Archives
Installing Hadoop in Pseudo Distributed Mode, Understanding Important configuration files, their Properties and Demon Threads,Accessing HDFS from Command Line
Map Reduce – Basic Exercises,Understanding Hadoop Eco-system,Introduction to Sqoop, use cases and Installation,Introduction to Hive, use cases and Installation,Introduction to Pig, use cases and Installation,Introduction to Oozie, use cases and Installation,Introduction to Flume, use cases and Installation,Introduction to Yarn
Mini Project – Importing Mysql Data using Sqoop and Querying it using Hive
How to develop Map Reduce Application, writing unit test,Best Practices for developing and writing, Debugging Map Reduce applications,Joining Data sets in Map Reduce
A. Introduction to Hive
What Is Hive?,Hive Schema and Data Storage,Comparing Hive to Traditional Databases,Hive vs. Pig,Hive Use Cases,Interacting with Hive
B. Relational Data Analysis with Hive
Hive Databases and Tables,Basic HiveQL Syntax,Data Types ,Joining Data Sets,Common Built-in Functions,Hands-On Exercise: Running Hive Queries on the Shell, Scripts, and Hue
C. Hive Data Management
Hive Data Formats,Creating Databases and Hive-Managed Tables,Loading Data into Hive,Altering Databases and Tables,Self-Managed Tables,Simplifying Queries with Views,Storing Query Results,Controlling Access to Data,Hands-On Exercise: Data Management with Hive
D. Hive Optimization
Understanding Query Performance,Partitioning,Bucketing,Indexing Data
E. Extending Hive
Topics : User-Defined Functions
F. Hands on Exercises – Playing with huge data and Querying extensively.
G. User defined Functions, Optimizing Queries, Tips and Tricks for performance tuning
A. Introduction to Pig
What Is Pig?,Pig’s Features,Pig Use Cases,Interacting with Pig
B. Basic Data Analysis with Pig
Pig Latin Syntax, Loading Data,Simple Data Types,Field Definitions,Data Output,Viewing the Schema,Filtering and Sorting Data,Commonly-Used Functions,Hands-On
Exercise: Using Pig for ETL Processing
C. Processing Complex Data with Pig
Complex/Nested Data Types,Grouping,Iterating Grouped Data,Hands-On Exercise: Analyzing Data with Pig
D. Multi-Data set Operations with Pig
Techniques for Combining Data Sets,Joining Data Sets in Pig,Set Operations,Splitting Data Sets,Hands-On Exercise
E. Extending Pig
Macros and Imports,UDFs,Using Other Languages to Process Data with Pig,Hands-On Exercise: Extending Pig with Streaming and UDFs
F. Pig Jobs
A. Introduction to Impala
What is Impala?,How Impala Differs from Hive and Pig,How Impala Differs from Relational Databases,Limitations and Future Directions Using the Impala Shell
B. Choosing the best (Hive, Pig, Impala)
Putting it all together and Connecting Dots,Working with Large data sets, Steps involved in analyzing large data
How ETL tools work in big data Industry,Connecting to HDFS from ETL tool and moving data from Local system to HDFS,Moving Data from DBMS to HDFS,Working with Hive with ETL Tool,Creating Map Reduce job in ETL tool,End to End ETL PoC showing Hadoop integration with ETL tool.
Major Project, Hadoop Development, cloudera Certification Tips and Guidance and Mock Interview Preparation, Practical Development Tips and Techniques, certification preparation
1. Project – Jobs
Problem Statement – It describes that how to create a job using metadata. For this it includes following actions:
Create XML File,Create Delimited File,Create Excel File,Create Database Connection
2. Hadoop Projects
A. Project – Working with Map Reduce, Hive, Sqoop
Problem Statement – It describes that how to import mysql data using sqoop and querying it using hive and also describes that how to run the word count mapreduce job.
B. Project – Connecting Pentaho with Hadoop Eco-system
Problem Statement – It includes:
Quick Overview of ETL and BI,Configuring Pentaho to work with Hadoop Distribution,Loading data into Hadoop cluster,Transforming data into Hadoop cluster
Extracting data from Hadoop Cluster
Hadoop is the pioneer of Big Data Hadoop training. This career-oriented training in Talend and Hadoop integration will give a definitive advantage when it comes with Big Data Hadoop applications. Talend Open Studio makes it possible to load unstructured data, machine-generated data and Relational data directly into Hadoop. You will master the technique of data integration strategy for Hadoop. This training is in line with clearing the Talend Data Integration Certified Developer Exam.
Intellipaat offers lifetime access to videos, course materials, 24/7 Support, and course material upgrades to latest version at no extra fees. For Hadoop and Spark training you get the Intellipaat Proprietary Virtual Machine for Lifetime and free cloud access for 6 months for performing training exercises. Hence it is clearly a one-time investment.
This course is designed for clearing the : Talend Data Integration Certified Developer Exam
The entire training course content is in line with the certification program and helps you clear the certification exam with ease and get the best jobs in the top MNCs.
As part of this training you will be working on real time projects and assignments that have immense implications in the real world industry scenario thus helping you fast track your career effortlessly.
At the end of this training program there will be quizzes that perfectly reflect the type of questions asked in the certification exam and helps you score better marks in certification exam.
Intellipaat Course Completion certificate will be awarded on the completion of Project work (on expert review)and upon scoring of at least 60% marks in the quiz. Intellipaat certification is well recognized in top 80+ MNCs like Ericsson, Cisco, Cognizant, Sony, Mu Sigma, Saint-Gobain, Standard Chartered, TCS, Genpact, Hexaware, etc.
The entire training course content is in line with respective certification program and helps you clear the requisite certification exam with ease and get the best jobs in the top MNCs.
A Senior Software Architect at NextGen Healthcare who has previously worked with IBM Corporation, Suresh Paritala has worked on Big Data, Data Science, Advanced Analytics, Internet of Things and Azure, along with AI domains like Machine Learning and Deep Learning. He has successfully implemented high-impact projects in major corporations around the world.
An experienced Blockchain Professional who has been bringing integrated Blockchain, particularly Hyperledger and Ethereum, and Big Data solutions to the cloud, David Callaghan has previously worked on Hadoop, AWS Cloud, Big Data and Pentaho projects that have had major impact on revenues of marquee brands around the world.
You will get Lifetime access to high quality interactive tutorials along with life time access to complete Course Material .There will be 24/7 access to video tutorials with email support. If you stuck in any unexpected problem we will provide online interactive sessions with trainer for issue resolving.
We provide 24X7 support by email for issues or doubts clearance for Self-paced training.
In online Instructor led training, trainer will be available to help you out with your queries regarding the course. If required, the support team can also provide you live support by accessing your machine remotely. This ensures that all your doubts and problems faced during labs and project work are clarified round the clock.