This is an all-in-one ETL Tools training which is a combo course designed by including individual courses of Informatica, SSIS, OBIEE, Talend with Hadoop Connectivity, DataStage, Pentaho with Hadoop Connectivity. This training will give you a powerful head start when it comes to the Extract, Transform, and Load processes that exclusively cater to the Business Intelligence domain.
Introduction to data warehousing, what is ETL, and overview of data cleansing, data scrubbing, data aggregation, learn what is Informatica PowerCenter ETL.
Overview of Informatica configuration, Integration Services, Installation of Informatica, operational administration activities.
Learn what is active and passive transformation and the differences between the two.
Learning about expression transformation, connected passive transformation to calculate value on a single row.
The different types of transformations like Sorter, Sequence Generator and Filter, the characteristics of each and where they are used.
Joiner transformation to bring data from heterogeneous data sources.
Understanding the Ranking and Union transformation, the characteristics and deployment.
Learn about the rank and dense rank functions and the syntax for them.
Understanding how router transformation works and its key features.
Lookup transformation Overview and different types of lookup transformation:Connected, unconnected, Dynamic and Static
What is SCD?,processing in xml, learn how to handle flat file, list and define various transformations, implementing ‘for loop’ in power center, concepts of pushdown optimization and partitioning, what is constraint based loading?, what is incremental aggregation?
Different types of designer: Mapplet, Worklet, target load plan, loading to multiple targets, linking property
Objectives of performance tuning, defining performance tuning, learning the sequence for tuning
Managing repository, repository manager – client tool, functionalities of previous versions, important tasks in repository manager
Understanding and adopting best practices for managing repository.
Common tasks in workflow manager, creating dependencies, scope of workflow monitor, define variable and parameter in Informatica.
parameter files and their scope, parameter of mapping, worklet and session parameter, workflow and service variable, basic development errors
session and workflow log, using debuggers, error handling framework in informatica, failover and high availability
configurations and mechanisms in recovery, checking health of powercenter environment
infacmd, pmrep, infasetup, processing of flat file
Fixed length and delimited, expression transformations- sequence numbers, dynamic targeting using transaction control.
Dynamic target with use of transaction control, indirect loading.
Importance of Java transformations to extend Power Center capabilities, transforming data, active and passive mode.
Understanding unconnected stored procedure in Informatica, the different scenarios of unconnected stored procedure usage.
Use of SQL transformation (active and passive)
Understanding Incremental Loading and aggregation and comparison between them .
Working with database constraints using PowerCenter, understanding constraint based loading and target load order.
The various types of XML transformation in Informatica, configuring a lookup as active.
Understanding what is data profiling in Informatica, its significance in validating content, ensuring quality and structure of data as per business requirements.
Understanding workflow as a group of instruction/command for integration services, learning how to create and delete workflow in Informatica.
Understanding the Database Connection, creating a new database connection in Informatica, the various steps involved.
Working with relational database tables in Informatica, mapping for loading data from flat files to relational database files.
Understanding how to deploy PowerCenter for seamless LinkedIn connectivity with Informatica PowerCenter.
Connecting Informatica PowerCenter with various data sources like Social media channels like Facebook, Twitter, etc.
Push down optimization for load-balancing on the server for better performance, the various types of partitioning for optimizing performance.
Understanding session cache, importance of cache creation with, implementing session cache, calculating cache requirement
Introduction to Business Intelligence, understanding the concept of Data Modeling, Data Cleaning, learning about Data Analysis, Data Representation, Data Transformation.
Introduction to ETL, the various steps involved Extract, Transform, Load, using a user’s email ID to read a flat file, extracting the User ID from email ID, loading the data into a database table.
Introduction to Connection Managers – logical representation of a connection, the various types of Connection Managers – Flat file, database, understanding how to load faster with OLE DB, comparing the performance of OLE DB and ADO.net, learning about Bulk Insert, working with Excel Connection Managers and identifying the problems.
Learning what is Data Transformation, converting data from one format to another, understanding the concepts of Character Map, Data Column and Copy Column Transformation, import and export column transformation, script and OLEDB Command Transformation, understanding row sampling, aggregate and sort transformation, percentage and row sampling.
Understanding Pivot and UnPivot Transformation, understanding Audit and Row Count Transformation, working with Split and Join Transformation, studying Lookup and Cache Transformation.
Understanding data that slowly changes over time, learning the process of how new data is written over old data, best practices.Detail explanation of three types of SCDs –Type1, Type2 and Type3, and their differences.
Understanding how Fuzzy Lookup Transformation varies from Lookup Transformation, the concept of Fuzzy matching,
Learning about error rows configuration, package logging, defining package configuration, understanding constraints and event handlers.
What are data models and why you need them? The scope, reach and benefits of data modeling, the business requirement intrinsic in data modeling, various case studies, the data modeling implications and the impact of data modeling on business intelligence.
Introduction to Business Intelligence, the architecture of data flow, stack description of BI technology, need for reporting in business, distinction between OLTP and OLAP, the BI platform in BI technology stack, the product and dimension hierarchy, multidimensional and relational analytical processing, types of Reports, multidimensional modelling.
Online Analytical Processing, the important concepts & terminology, significance of OLAP in business intelligence life cycle, understanding various data schemas like star, snow flake and constellation, aggregate and calculated measures.
Introduction to Oracle Business Intelligence Enterprise Edition, overview of the OBIEE 11g product, the Architecture of OBIEE, key features and components.
Understanding what is Oracle Business Intelligence Repository, installation of OBIEE on Windows system, directory structure installation, services.
Understanding how to build a Business Model and Mapping Layer in BI Repository, creating the Presentation Layer.
working with the Enterprise Manager, testing and validating the Repository, cache disabling.
Working with the Repository, creating Test Report, adding calculations, deploying OBIEE analysis, coming up with landing page UI and its features.
Learning about the Oracle BI Presentation Catalog, accessing and managing objects, Report archiving and exporting, data grouping and limiting in analyses, data formatting, conditional formatting.
The OBIEE dashboard setup, basics of dashboard and dashboard pages, deploying Dashboard Builder for building Dashboards, editing, sharing, and saving Dashboard analysis.
Securing the Oracle Business Intelligence Suite with Enterprise Manager, creating alerts, managing grouping and maintenance.
Working of Talend,Introduction to Talend Open Studio and its Usability,What is Meta Data?
Creating a new Job,Concept and creation of Delimited file,Using Meta Data and its Significance,What is propagation?,Data integration schema,Creating Jobs using t-filter row and string filter,Input delimation file creation
Job design and its features,What is a T map?,Data Aggregation,Introduction to triplicate and its Working,Significance and working of tlog,T map and its properties
Extracting data from the source,Source and Target in Database (MySQL),Creating a connection, Importing Schema or Metadata
Calling and using Functions,What are Routines?,Use of XML file in Talend,Working of Format data functions,What is type casting?
Defining Context variable,Learning Parameterization in ETL,Writing an example using trow generator,Define and Implement Sorting,What is Aggregator?,Using t flow for publishing data,Running Job in a loop
Learn to start Trish Server,Connectivity of ETL tool connect with Hadoop,Define ETL method,Implementation of Hive,Data Import into Hive with an example,An example of Partitioning in hive,Reason behind no customer table overwriting?,Component of ETL,Hive vs. Pig,Data Loading using demo customer,ETL Tool,Parallel Data Execution
Big Data, Factors constituting Big Data,Hadoop and Hadoop Ecosystem,Map Reduce -Concepts of Map, Reduce, Ordering, Concurrency, Shuffle, Reducing, Concurrency ,Hadoop Distributed File System (HDFS) Concepts and its Importance,Deep Dive in Map Reduce – Execution Framework, Partitioner Combiner, Data Types, Key pairs,HDFS Deep Dive – Architecture, Data Replication, Name Node, Data Node, Data Flow, Parallel Copying with DISTCP, Hadoop Archives
Installing Hadoop in Pseudo Distributed Mode, Understanding Important configuration files, their Properties and Demon Threads,Accessing HDFS from Command Line
Map Reduce – Basic Exercises,Understanding Hadoop Eco-system,Introduction to Sqoop, use cases and Installation,Introduction to Hive, use cases and Installation,Introduction to Pig, use cases and Installation,Introduction to Oozie, use cases and Installation,Introduction to Flume, use cases and Installation,Introduction to Yarn
Mini Project – Importing Mysql Data using Sqoop and Querying it using Hive
How to develop Map Reduce Application, writing unit test,Best Practices for developing and writing, Debugging Map Reduce applications,Joining Data sets in Map Reduce
A. Introduction to Hive
What Is Hive?,Hive Schema and Data Storage,Comparing Hive to Traditional Databases,Hive vs. Pig,Hive Use Cases,Interacting with Hive
B. Relational Data Analysis with Hive
Hive Databases and Tables,Basic HiveQL Syntax,Data Types ,Joining Data Sets,Common Built-in Functions,Hands-On Exercise: Running Hive Queries on the Shell, Scripts, and Hue
C. Hive Data Management
Hive Data Formats,Creating Databases and Hive-Managed Tables,Loading Data into Hive,Altering Databases and Tables,Self-Managed Tables,Simplifying Queries with Views,Storing Query Results,Controlling Access to Data,Hands-On Exercise: Data Management with Hive
D. Hive Optimization
Understanding Query Performance,Partitioning,Bucketing,Indexing Data
E. Extending Hive
Topics : User-Defined Functions
F. Hands on Exercises – Playing with huge data and Querying extensively.
G. User defined Functions, Optimizing Queries, Tips and Tricks for performance tuning
A. Introduction to Pig
What Is Pig?,Pig’s Features,Pig Use Cases,Interacting with Pig
B. Basic Data Analysis with Pig
Pig Latin Syntax, Loading Data,Simple Data Types,Field Definitions,Data Output,Viewing the Schema,Filtering and Sorting Data,Commonly-Used Functions,Hands-On
Exercise: Using Pig for ETL Processing
C. Processing Complex Data with Pig
Complex/Nested Data Types,Grouping,Iterating Grouped Data,Hands-On Exercise: Analyzing Data with Pig
D. Multi-Data set Operations with Pig
Techniques for Combining Data Sets,Joining Data Sets in Pig,Set Operations,Splitting Data Sets,Hands-On Exercise
E. Extending Pig
Macros and Imports,UDFs,Using Other Languages to Process Data with Pig,Hands-On Exercise: Extending Pig with Streaming and UDFs
F. Pig Jobs
A. Introduction to Impala
What is Impala?,How Impala Differs from Hive and Pig,How Impala Differs from Relational Databases,Limitations and Future Directions Using the Impala Shell
B. Choosing the best (Hive, Pig, Impala)
Putting it all together and Connecting Dots,Working with Large data sets, Steps involved in analyzing large data
How ETL tools work in big data Industry,Connecting to HDFS from ETL tool and moving data from Local system to HDFS,Moving Data from DBMS to HDFS,Working with Hive with ETL Tool,Creating Map Reduce job in ETL tool,End to End ETL PoC showing Hadoop integration with ETL tool.
Major Project, Hadoop Development, cloudera Certification Tips and Guidance and Mock Interview Preparation, Practical Development Tips and Techniques, certification preparation
Introduction to the IBM Information Server Architecture, the Server Suite components, the various tiers in the Information Server.
Understanding the IBM InfoSphere DataStage, the Job life cycle to develop, test, deploy and run data jobs, high performance parallel framework, real-time data integration.
Introduction to the design elements, various DataStage jobs, creating massively parallel framework, scalable ETL features, working with DataStage jobs.
Understanding the DataStage Job, creating a Job that can effectively extract, transform and load data, cleansing and formatting data to improve its quality.
Learning about data parallelism – pipeline parallelism and partitioning parallelism, the two types of data partitioning – Key-based partitioning and Keyless partitioning, detailed understanding of partitioning techniques like round robin, entire, hash key, range, DB2 partitioning, data collecting techniques and types like round robin, order, sorted merge and same collecting methods.
Understanding the various job stages – data source, transformer, final database, the various parallel stages – general objects, debug and development stages, processing stage, file stage types, database stage, real time stage, restructure stage, data quality and sequence stages of InfoSphere DataStage.
Understanding the parallel job stage editors, the important types of stage editors in DataStage.
Working with the Sequential file stages, understanding runtime column propagation, working with RCP in sequential file stages, using the sequential file stage as a source stage and target stage.
Understanding the difference between dataset and fileset and how DataStage works in each scenario.
Creating of a sample DataStage job using the dataset and fileset types of data.
Learning about the various properties of Sequential File Stage and Dataset stage.
Creating a lookup file set, working in parallel or sequential stage, learning about single input and output link.
Studying the Transformer Stage in DataStage, the basic working of this stage, characteristics -single input, any number of outputs and reject link, how it differs from other processing stages, the significance of Transformer Editor, and evaluation sequence in this stage.
Deep dive into Transformer functions – String, type conversion, null handling, mathematical, utility functions, understanding the various features like constraint, system variables, conditional job aborting, Operators and Trigger Tab.
Understanding the looping functionality in Transformer Stage, output with multiple rows for single input row, the procedure for looping, loop variable properties.
Connecting to the Teradata Enterprise Stage, properties of connection.
Generating data using Row Generator sequentially in a single partition, configuring to run in parallel.
Understanding the Aggregator Stage in DataStage, the two types of aggregation – hash mode and sort mode.
Deep learning of the various stages in DataStage, the importance of Copy, Filter and Modify stages to reduce number of Transformer Stages.
Understanding Parameter Set, storing DataStage and Quality Stage job parameters and default values in files, the procedure to deploy Parameter Sets function and its advantages.
Introduction to Funnel Stage, copying multiple input data sets into single output data set, the three modes – continuous funnel, sort funnel and sequence.
Topics – Understanding the Join Stage and its types, Join Stage Partitioning, performing various Join operations.
Understanding the Lookup Stage for processing using lookup operations, knowing when to use Lookup Stage, partitioning method for Lookup Stage, comparing normal and sparse lookup, doing lookup for a range of values using Range Lookup.
Learning about the Merge Stage, multiple input links and single output link, need for key partitioned and sorted input data set, specifying several reject links in Merge Stage, comparing the Join vs. Lookup vs. Merge Stages of processing.
Studying the FTP Enterprise Stage, transferring multiple files in parallel, invoking the FTP client, transferring to or from remote host using FTP protocol, FTP Enterprise Stage properties.
Understanding the Sort Stage, performing complex sort operations, learning about Stable Sort, removing duplicates.
Working with Teradata Connector in DataStage, configuring as a source, target or parallel in a lookup context for parallel or server jobs, learning about Teradata Parallel Transporter direct API for bulk operations and the Operators deployed.
Learning about the various Database Connector Stages for working with Balanced Optimization Tool.
Understanding the ABAP Extract Stage, extracting data from SAP data repositories, generating ABAP extraction programs, executing SQL query and sending data to DataStage Server.
The various Stages for debugging the parallel job designs, controlling flow of multiple activities in a job sequence, understanding the various data sampling stages in a Debug/Development Stage like Head Stage, Tail Stage and Sample Stage.
Learning about Job Activity Stage which specifies a DataStage Server or parallel job to execute.
Oveview of Pentaho Business Intelligence and Analytics tools, database dimensional modelling, using Star Schema for querying large data sets, understanding fact tables and dimensions tables, Snowflake Schema, principles of Slowly Changing Dimensions.
Understanding the basics of clustering in Pentaho Data Integration, creating a database connection, moving a CSV file input to table output and Microsoft Excel output, moving from Excel to data grid and log.
The Pentaho Data Integration Transformation steps, adding sequence, understanding calculator, Penthao number range, string replace, selecting field value, sorting and splitting rows, string operation, unique row and value mapper.
Working with secure socket command, Pentaho null value and error handling, Pentaho mail, row filter and priorities stream.
Understanding Slowly Changing Dimensions, making ETL dynamic, dynamic transformation, creating folders, scripting, bulk loading, file management, working with Pentaho file transfer, Repository, XML, Utility and File encryption.
Creating dynamic ETL, passing variable and value from job to transformation, deploying parameter with transformation, importance of Repository in Pentaho, database connection, environmental variable and repository import.
Working with Pentaho dashboard and Report, effect of row bending, designing a report, working with Pentaho Server, creation of line, bar and pie chart in Pentaho.
Working with Pentaho Dashboard, passing parameters in Report and Dashboard, drill-down of Report, deploying Cubes for report creation, working with Excel sheet, Pentaho data integration for report creation.
What is a Cube? Creation and benefit of Cube, working with Cube, Report and Dashboard creation with Cube.
Understanding the basics of Multi Dimensional Expression (MDX), basics of MDX, understanding Tuple, its implicit dimensions, MDX sets, level, members, dimensions referencing, hierarchical navigation, and meta data.
Pentaho analytics for discovering, blending various data types and sizes, including advanced analytics for visualizing data across multiple dimensions.
Deploying ETL capabilities for working on the Hadoop ecosystem, integrating with HDFS and moving data from local file to distributed file system, deploying Apache Hive, designing MapReduce jobs, complete Hadoop integration with ETL tool.
Creating interactive dashboards for visualizing highly graphical representation of data for improving key business performance.
Data– Working with Admin Console
This is project that will help you understand the Informatica admin console. You will work on a real world scenario of creating and managing roles and responsibilities for various users and designing the various sessions and workflow in the Informatica ETL environment. This project also includes assigning users to groups, creating a collaborative environment, lock handling on repository objects of Informatica in order to develop a complete mastery of working with the Informatica tool.
Project – Configuration and Logging
Topics – In this SQL Server Integration Services (SSIS) project you will extensively work on multiple data from heterogeneous source into SQL Server. As part of the project you will learn to clean and standardize data and automate the administrative work. Some of the tasks that you will be performing are adding logs to SSIS package, configuration and saving it to an XML file. Upon completion of the project you will have hands-on experience in handling constraints, error row configuration and event handlers.
Project – Report Formatting
Data – Revenue
Topics – This is an Oracle Business Intelligence project that is associated with creating complex dashboards and performing formatting of the report. You will gain hands-on experience in filtering and sorting of the report data depending on the business requirements. This project will also help you understand how to convert the data into graphs for easy visualization and analysis. As part of the project you will gain experience in calculating the subtotal and grand total in a business scenario while finding the revenue generated.
1. Project – Jobs
Problem Statement – It describes that how to create a job using metadata. For this it includes following actions:
Create XML File,Create Delimited File,Create Excel File,Create Database Connection
2. Hadoop Projects
A. Project – Working with Map Reduce, Hive, Sqoop
Problem Statement – It describes that how to import mysql data using sqoop and querying it using hive and also describes that how to run the word count mapreduce job.
B. Project – Connecting Pentaho with Hadoop Eco-system
Problem Statement – It includes:
Quick Overview of ETL and BI,Configuring Pentaho to work with Hadoop Distribution,Loading data into Hadoop cluster,Transforming data into Hadoop cluster
Extracting data from Hadoop Cluster
Project – SCD2 Implementation
Data –Supplier data
Topics – This project is associated with working on the Slowly Changing Dimensions type 2 where entire history is stored in the database. You will learn how to create a surrogate key generator for implementing SCD. This involves creating additional dimensions and segmenting old and new values for extraction.
Project– Pentaho Interactive Report
Data– Sales, Customer, Product
Topics – In this Pentaho project you will be exclusively working on creating Pentaho interactive reports for sales, customer and product data fields. As part of the project you will learn to create a data source, build a Mondrian cube which is represented in an XML file. You will gain advanced experience in managing data sources, building and formatting Pentaho report, change the report template and scheduling of reports.
Intellipaat is a market leader in the ETL tools training. Today ETL tools are increasingly used in business scenarios in order to efficiently derive insights from huge amounts of disparate data. The process of Extract-Transform-Load is pretty standard when getting data from diverse databases, cleansing, filtering, transforming and finally deploying the data into the destination database.
This training includes some of the most powerful and efficient ETL tools like Informatica, SSIS, OBIEE, Talend, DataStage, and Pentaho. The entire course content of this combo trained is created towards helping you clear multiple certifications exams viz. Power Center Developer & Administrator Certification, Oracle Business Intelligence 10 Foundation Essentials (1Z0-526) exam, Talend Data Integration Certified Developer Exam, IBM Certified Solution Developer – InfoSphere DataStage, Pentaho Business Analytics Implementation, Cloudera Spark and Hadoop Developer Certification (CCA175) exam.
This is a completely career-oriented training designed by industry experts. Your training program includes real time projects, step-by-step assignments to evaluate your progress and specially designed quizzes for clearing the requisite certification exams.
Intellipaat also offers lifetime access to videos, course materials, 24/7 Support, and course material upgrades to latest version at no extra fees. Hence it is clearly a one-time investment.
Intellipaat basically offers the self-paced training and online instructor-led training. Apart from that we also provide corporate training for enterprises. All our trainers come with over 12 years of industry experience in relevant technologies and also they are subject matter experts working as consultants. You can check about the quality of our trainers in the sample videos provided.
If you have any queries you can contact our 24/7 dedicated support to raise a ticket. We provide you email support and solution to your queries. If the query is not resolved by email we can arrange for a one-on-one session with our trainers. The best part is that you can contact Intellipaat even after completion of training to get support and assistance. There is also no limit on the number of queries you can raise when it comes to doubt clearance and query resolution.
The Intellipaat self-paced training is for people who want to learn at their own leisurely pace. As part of this program we provide you with one-on-one sessions, doubt clearance over email, 24/7 Live Support, lifetime LMS and upgrade to the latest version at no extra cost. The prices of self-paced training can be 75% lesser than online training. While studying should you face any unexpected challenges then we shall arrange a Virtual LIVE session with the trainer.
We provide you with the opportunity to work on real world projects wherein you can apply your knowledge and skills that you acquired through our training. We have multiple projects that thoroughly test your skills and knowledge of various aspect and components making you perfectly industry-ready. These projects could be in exciting and challenging fields like banking, insurance, retail, social networking, ecommerce, marketing, sales, high technology and so on. The Intellipaat projects are equivalent to six months of relevant experience in the corporate world.
Yes, Intellipaat does provide you with placement assistance. We have tie-ups with 80+ organizations including Ericsson, Cisco, Cognizant, TCS, among others that are looking for skilled & quality professionals and we would be happy to assist you with the process of preparing yourself for the interview and the job.
Yes, if you would want to upgrade from the self-paced training to instructor-led training then you can easily do so by paying the difference of the fees amount and joining the next batch of classes which shall be separately notified to you.
Upon successful completion of training you have to take a set of quizzes, complete the projects and upon review and on scoring over 60% marks in the qualifying quiz the official Intellipaat verified certificate is awarded.The Intellipaat Certification is a seal of approval and is highly recognized in 80+ corporations around the world including many in the Fortune 500 list of companies.
This training course is designed for clearing the following exams:
The entire training course content is in line with respective certification program and helps you clear the requisite certification exam with ease and get the best jobs in the top MNCs.
As part of this training you will be working on real time projects and assignments that have immense implications in the real world industry scenario thus helping you fast track your career effortlessly.
At the end of this training program there will be quizzes that perfectly reflect the type of questions asked in the respective certification exams and helps you score better marks in certification exam.
Intellipaat Course Completion certificate will be awarded on the completion of Project work (on expert review)and upon scoring of at least 60% marks in the quiz. Intellipaat certification is well recognized in top 80+ MNCs like Ericsson, Cisco, Cognizant, Sony, Mu Sigma, Saint-Gobain, Standard Chartered, TCS, Genpact, Hexaware, etc.
You will get Lifetime access to high quality interactive tutorials along with life time access to complete Course Material .There will be 24/7 access to video tutorials with email support. If you stuck in any unexpected problem we will provide online interactive sessions with trainer for issue resolving.
We provide 24X7 support by email for issues or doubts clearance for Self-paced training.
In online Instructor led training, trainer will be available to help you out with your queries regarding the course. If required, the support team can also provide you live support by accessing your machine remotely. This ensures that all your doubts and problems faced during labs and project work are clarified round the clock.
This course is designed for clearing Informatica Professional Certification Exam, OBIEE 11g Certification Exam, Talend Certification Exam, IBM Certified Solution Developer – InfoSphere DataStage v8.5, Pentaho Certification Exam, Cloudera Certification for Hadoop. At the end of the course there will be a quiz and project assignments once you complete them you will be awarded with Intellipaat Course Completion certificate.
24th February 2017
7th February 2017
"PMI®", "PMP®" and "PMI-ACP®" are registered marks of the Project Management Institute, Inc.
The Open Group®, TOGAF® are trademarks of The Open Group.
The Swirl logoTM is a trade mark of AXELOS Limited.
ITIL® is a registered trade mark of AXELOS Limited.
PRINCE2® is a Registered Trade Mark of AXELOS Limited.
Certified ScrumMaster® (CSM) and Certified Scrum Trainer® (CST) are registered trademarks of SCRUM ALLIANCE®
Professional Scrum Master is a registered trademark of Scrum.org