ETL Tools Training

Our ETL tools master’s program will let you gain proficiency in top ETL tools like Informatica, SSIS, OBIEE, Talend, DataStage and Pentaho. You will work on real-world projects in data warehousing, data integration, Hadoop connectivity, data modelling, SCD and data schema.

Get Data Warehousing and ERwin course free with this course!

Key Features

  • Self-paced Videos : 146 Hrs
  • Lab & Project Hrs : 292 Hrs
  • Access Duration: Lifetime
  • 24 x 7 Support
  • Flexible Schedule
  • Certification and Job Assistance

About ETL Tools Training Course

This Intellipaat training in ETL tools will give you a powerful head start when it comes to extract, transform and load processes that exclusively cater to the Business Intelligence domain. This all-in-one course in ETL includes six most powerful ETL tools, and upon the completion of the training you will be a certified ETL professional.

List of Courses Included

  • Informatica
  • SSIS
  • Talend
  • DataStage
  • Pentaho

What will you learn in this training course?

  1. Introduction to ETL and its importance in data warehousing
  2. Setting up and installation of various ETL tools
  3. Optimizing ETL tools for best results based on specific job requirements
  4. OLAP, ETL and OLTP systems
  5. Data modeling and dimensional modeling
  6. Working with star and snowflake schemas
  7. Dimensions, measures and fact tables in data warehousing
  8. Data types and performance tuning in ETL
  9. SCD types and various methods of handling SCD data
  10. Various types of data transformation techniques
  11. Source qualifier transformation and mappings

Who should take up this ETL training course?

  • ETL and SQL Developers and Business Intelligence Professionals
  • Database, Data Warehousing and Mainframe Professionals and Project Managers

What are the prerequisites for taking up this ETL training course?

Anybody can take up this training course. However, having a basic knowledge of SQL can be helpful.

Why should you take up this training course?

The process of ETL is of absolute importance in any data warehousing and Business Intelligence scenario. Getting the ETL process right has a direct bearing on the type of data that will be loaded in the data warehouse, and this directly affects the quality of Business Intelligence that is derived and finally the business insights that are reached at. This Intellipaat training is your one stop for mastering some of the best ETL tools available in the market today. Upon the completion of this course, you can command best salaries in the ETL, data warehousing and Business Intelligence domains in top MNCs around the world.
view more
Read Less

Informatica Course Content

Data Warehousing and Cleansing Concepts

What is data warehousing, understanding the Extract, Transform, Load process, what is data aggregation, data scrubbing, data cleansing and importance of Informatica PowerCenter ETL.

Informatica Installation and Configuration

Configuring the Informatica tool, how to install the Informatica operational administration activities and integration services.

Hands-on Exercise – Step by step process for the installation of Informatica PowerCenter

Working with active and passive transformation

Understanding the difference between active and passive transformations and the highlights of each transformation.

Working with expression transformation

Learning about expression transformation, connected passive transformation to calculate value on a single row.

Hands-on Exercise – Calculate value on a single row using connected passive transformation

Working with Sorter, Sequence Generator, Filter transformation

The different types of transformations like Sorter, Sequence Generator and Filter, the characteristics of each and where they are used.

Hands-on Exercise – Transform data using Filter technique, Use a sequence generator, Use a Sorter

Working with Joiner Transformation

Joiner transformation to bring data from heterogeneous data sources.

Hands-on Exercise – Use Joiner transformation to bring data from heterogeneous data sources

Working with Ranking and Union Transformation

Understanding the Ranking and Union transformation, the characteristics and deployment.

Hands-on Exercise – Perform Ranking and Union transformation

Syntax for Rank and Dense Rank

Learn about the rank and dense rank functions and the syntax for them.

Hands-on Exercise – Perform rank and dense rank functions

Router Transformation

Understanding how router transformation works and its key features.

Hands-on Exercise – Perform router transformation

Source Qualifier Transformation and Mappings

 Lookup transformation Overview and different types of lookup transformation:Connected, unconnected, Dynamic and Static

Hands-on Exercise – Perform Lookup transformation: Connected, unconnected, Dynamic and Static

Slowly Changing Dimension in Informatica

What is SCD?,processing in xml, learn how to handle flat file, list and define various transformations, implementing ‘for loop’ in power center, concepts of pushdown optimization and partitioning, what is constraint based loading?, what is incremental aggregation?

Hands-on Exercise – Load data from a flat file,Implement ‘for loop’ in power center, Use pushdown optimization and partitioning, Do constraint based data loading, Use incremental aggregation technique to aggregate data

Mapplet and loading to multiple designer

Different types of designer: Mapplet, Worklet, target load plan, loading to multiple targets, linking property

Hands-on Exercise – Create a mapplet and a worklet, Plan a target load, Load multiple targets

Performance Tuning in Informatica

Objectives of performance tuning, defining performance tuning, learning the sequence for tuning

Hands-on Exercise – Do performance tuning by following different techniques

Repository Manager

Managing repository, repository manager – client tool, functionalities of previous versions, important tasks in repository manager

Hands-on Exercise – Manage tasks in repository manager

Best Practices in Informatica

Understanding and adopting best practices for managing repository.

Workflow Informatica

Common tasks in workflow manager, creating dependencies, scope of workflow monitor

Hands-on Exercise – Create workflow with dependencies of nodes

Parameters & Variables

Define variable and parameter in Informatica, parameter files and their scope, parameter of mapping, worklet and session parameter, workflow and service variable, basic development errors

Hands-on Exercise – Define variables and parameter in functions, Use parameter of mapping, Use worklet and session parameter, Use workflow and service variable

Error handling and recovery in Informatica

session and workflow log, using debuggers, error handling framework in informatica, failover and high availability

Hands-on Exercise – Debug development errors, Read Workflow logs, Use Error handling framework

High Availability & Failover in Informatica

configurations and mechanisms in recovery, checking health of powercenter environment

Hands-on Exercise – Configure recovery options, Check health of Powercenter environment

Working with different utilities in Informatica

infacmd, pmrep, infasetup, processing of flat file

Hands-on Exercise – Use commands infacmd, pmrep, infasetup

Flat file processing (advance transformations)

Fixed length and delimited, expression transformations- sequence numbers, dynamic targeting using transaction control.

Hands-on Exercise – Perform expression transformations- sequence numbers, dynamic targeting using transaction control.

Dynamic targeting

Dynamic target with use of transaction control, indirect loading.

Hands-on Exercise – Use of transaction control with dynamic target, indirect loading.

Working with Java transformations

Importance of Java transformations to extend Power Center capabilities, transforming data, active and passive mode.

Hands-on Exercise – Use Java transformations to extend Power Center capabilities

Unconnected Stored Procedure usage

Understanding unconnected stored procedure in Informatica, the different scenarios of unconnected stored procedure usage.

Hands-on Exercise – Use unconnected stored procedure in Informatica in different scenarios

Advance Concepts in SCD

Use of SQL transformation (active and passive)

Hands-on Exercise – Use of SQL transformation (active and passive)

Incremental Data Loading and Aggregation

Understanding Incremental Loading and aggregation and comparison between them .

Hands-on Exercise – Do Incremental Loading and aggregation

Constraint based loading

Working with database constraints using PowerCenter, understanding constraint based loading and target load order.

Hands-on Exercise – Perform constraint based loading in a given order

XML Transformation and active look up

The various types of XML transformation in Informatica, configuring a lookup as active.

Hands-on Exercise – Perform XML transformation, Configure a lookup as active

Profiling in PowerCenter

Understanding what is data profiling in Informatica, its significance in validating content, ensuring quality and structure of data as per business requirements.

Hands-on Exercise – Create data profiling in Informatica and validate the content

Workflow Creation and Deletion

Understanding workflow as a group of instruction/command for integration services, learning how to create and delete workflow in Informatica.

Hands-on Exercise – Create and delete workflow in Informatica

Database Connection

Understanding the Database Connection, creating a new database connection in Informatica, the various steps involved.

Hands-on Exercise – Create a new database connection in Informatica

Relational Database Tables

Working with relational database tables in Informatica, mapping for loading data from flat files to relational database files.

Hands-on Exercise – Create mapping for loading data from flat files to relational database files

LinkedIn Connection

Understanding how to deploy PowerCenter for seamless LinkedIn connectivity with Informatica PowerCenter.

Hands-on Exercise – Deploy PowerCenter for seamless LinkedIn connectivity with Informatica PowerCenter.

Connection with Sources

Connecting Informatica PowerCenter with various data sources like Social media channels like Facebook, Twitter, etc.

Hands-on Exercise – Connect Informatica PowerCenter with various data sources like Social media channels like Facebook, Twitter, etc.

Push down optimization & Partitioning

Push down optimization for load-balancing on the server for better performance, the various types of partitioning for optimizing performance.

Hands-on Exercise – Optimize using Push down technique for load-balancing on the server for better performance, Create various types of partitioning for optimizing performance

Cache management

Understanding session cache, importance of cache creation with, implementing session cache, calculating cache requirement

Hands-on Exercise – Implement cache creation with, Work with session cache

MSBI SSIS Course Content

What is BI?

Introduction to Business Intelligence, understanding the concept of Data Modeling, Data Cleaning, learning about Data Analysis, Data Representation, Data Transformation.

ETL Overview

Introduction to ETL, the various steps involved Extract, Transform, Load, using a user’s email ID to read a flat file, extracting the User ID from email ID, loading the data into a database table.

Working with Connection Managers

Introduction to Connection Managers – logical representation of a connection, the various types of Connection Managers – Flat file, database, understanding how to load faster with OLE DB, comparing the performance of OLE DB and, learning about Bulk Insert, working with Excel Connection Managers and identifying the problems.

Data Transformations

Learning what is Data Transformation, converting data from one format to another, understanding the concepts of Character Map, Data Column and Copy Column Transformation, import and export column transformation, script and OLEDB Command Transformation, understanding row sampling, aggregate and sort transformation, percentage and row sampling.

Advance Data Transformation

Understanding Pivot and UnPivot Transformation, understanding Audit and Row Count Transformation, working with Split and Join Transformation, studying Lookup and Cache Transformation, Integrating with Azure Analysis Services, elastic nature of MSBI to integrate with the Azure cloud service, scale out deployment option for MSBI, working with cloud-borne data sources and query analysis. Scaling out the SSIS package, deploying for tighter windows, working with larger amount of data sources, SQL Server vNext for enhancing SQL Server features, more choice of development languages and data types both on-premise and in the cloud.

Slowly Changing Dimensions

Understanding data that slowly changes over time, learning the process of how new data is written over old data, best practices.Detail explanation of three types of SCDs –Type1, Type2 and Type3, and their differences.

Overview of Fuzzy Look-up Transformation and Lookup and Term Extraction

Understanding how Fuzzy Lookup Transformation varies from Lookup Transformation, the concept of Fuzzy matching,

Concepts of Logging & Configuration

Learning about error rows configuration, package logging, defining package configuration, understanding constraints and event handlers.

OBIEE Course Content

Data Modeling Concepts

Introduction to OBIEE,installation of OBIEE,What are data models and why you need them? The scope, reach and benefits of data modeling,data warehousing,sample OBIEE Report, the business requirement intrinsic in data modeling, various case studies, the data modeling implications and the impact of data modeling on business intelligence.

Business Intelligence Concepts

Introduction to Business Intelligence, the architecture of data flow,OBIEE architecture, stack description of BI technology,BI Server,BI Scheduler,displaying report with data,need for reporting in business, distinction between OLTP and OLAP, the BI platform in BI technology stack, the product and dimension hierarchy, multidimensional and relational analytical processing, types of Reports, multidimensional modelling.

Understanding OLAP

Online Analytical Processing,the OBIEE admin tools, RPD, the important concepts & terminology, significance of OLAP in business intelligence life cycle, understanding various data schemas like star,designing with Star Schema,creation of physical layer & simple RPD, enterprise information model, snow flake and constellation, aggregate and calculated measures.

Oracle Business Intelligence Suite

Introduction to Oracle Business Intelligence Enterprise Edition, overview of the OBIEE product, the Architecture of OBIEE, key features and components,creating a simple report, business model, hierarchy, presentation and mapping.

Oracle BI Repository

Understanding what is Oracle Business Intelligence Repository, installation of OBIEE on Windows system, directory structure installation, services,, analytics and interactive reporting, dashboard creation, multiple report creation, formula editing, column properties, altering

BI Repository Business Model

Understanding how to build a Business Model and Mapping Layer in BI Repository, creating the Presentation Layer,formatting of data, conditional formatting, saving the report, creating and sharing folder. Topics – Data format, Conditional format, Removing filters,Like,Advanced ,Save the report , Shared folder and my folder, Creating new folder

Business Model Continued

working with the Enterprise Manager, testing and validating the Repository, cache disabling,dashboard prompt, filtering, editing of dashboard with action link.Water fall model.

Working with Repository

Working with the Repository, creating Test Report, adding calculations, deploying OBIEE analysis, coming up with landing page UI and its features,repository variables, session and presentation variables.

BI Presentation Catalog

Learning about the Oracle BI Presentation Catalog, accessing and managing objects, Report archiving and exporting, data grouping and limiting in analyses, data formatting, conditional formatting, master detail report, report creation with multiple subject areas, data mashup, visual analyzer, performance tile, BI functionality, waterfall model, graphs, pivot table, Pie chart, KPI watchList.

Dashboard Creation

The OBIEE dashboard setup, basics of dashboard and dashboard pages, deploying Dashboard Builder for building Dashboards, editing, sharing, and saving Dashboard analysis,cache creation & clearing, ODBC functions in OBIEE, Logical Table Source, summary & detail report.

OBIEE Security & Management

Securing the Oracle Business Intelligence Suite with Enterprise Manager, creating alerts, managing grouping and maintenance,administrating, the various types of security in OBIEE, object, task and folder level security, Report scheduling.

Talend For Hadoop Course Content

Getting started with Talend

Working of Talend,Introduction to Talend Open Studio and its Usability,What is Meta Data?


Creating a new Job,Concept and creation of Delimited file,Using Meta Data and its Significance,What is propagation?,Data integration schema,Creating Jobs using t-filter row and string filter,Input delimation file creation

Overview of Schema and Aggregation

Job design and its features,What is a T map?,Data Aggregation,Introduction to triplicate and its Working,Significance and working of tlog,T map and its properties

Connectivity with Data Source

Extracting data from the source,Source and Target in Database (MySQL),Creating a connection, Importing Schema or Metadata

Getting started with Routines/Functions

Calling and using Functions,What are Routines?,Use of XML file in Talend,Working of Format data functions,What is type casting?

Data Transformation

Defining Context variable,Learning Parameterization in ETL,Writing an example using trow generator,Define and Implement Sorting,What is Aggregator?,Using t flow for publishing data,Running Job in a loop

Connectivity with Hadoop

Learn to start Trish Server,Connectivity of ETL tool connect with Hadoop,Define ETL method,Implementation of Hive,Data Import into Hive with an example,An example of Partitioning in hive,Reason behind no customer table overwriting?,Component of ETL,Hive vs. Pig,Data Loading using demo customer,ETL Tool,Parallel Data Execution

Introduction to Hadoop and its Ecosystem, Map Reduce and HDFS

Big Data, Factors constituting Big Data,Hadoop and Hadoop Ecosystem,Map Reduce -Concepts of Map, Reduce, Ordering, Concurrency, Shuffle, Reducing, Concurrency ,Hadoop Distributed File System (HDFS) Concepts and its Importance,Deep Dive in Map Reduce – Execution Framework, Partitioner Combiner, Data Types, Key pairs,HDFS Deep Dive – Architecture, Data Replication, Name Node, Data Node, Data Flow, Parallel Copying with DISTCP, Hadoop Archives

Hands on Exercises

Installing Hadoop in Pseudo Distributed Mode, Understanding Important configuration files, their Properties and Demon Threads,Accessing HDFS from Command Line

Map Reduce – Basic Exercises,Understanding Hadoop Eco-system,Introduction to Sqoop, use cases and Installation,Introduction to Hive, use cases and Installation,Introduction to Pig, use cases and Installation,Introduction to Oozie, use cases and Installation,Introduction to Flume, use cases and Installation,Introduction to Yarn

Mini Project – Importing Mysql Data using Sqoop and Querying it using Hive

Deep Dive in Map Reduce

How to develop Map Reduce Application, writing unit test,Best Practices for developing and writing, Debugging Map Reduce applications,Joining Data sets in Map Reduce


A. Introduction to Hive

What Is Hive?,Hive Schema and Data Storage,Comparing Hive to Traditional Databases,Hive vs. Pig,Hive Use Cases,Interacting with Hive

B. Relational Data Analysis with Hive

Hive Databases and Tables,Basic HiveQL Syntax,Data Types ,Joining Data Sets,Common Built-in Functions,Hands-On Exercise: Running Hive Queries on the Shell, Scripts, and Hue

C. Hive Data Management

Hive Data Formats,Creating Databases and Hive-Managed Tables,Loading Data into Hive,Altering Databases and Tables,Self-Managed Tables,Simplifying Queries with Views,Storing Query Results,Controlling Access to Data,Hands-On Exercise: Data Management with Hive

D. Hive Optimization

Understanding Query Performance,Partitioning,Bucketing,Indexing Data

E. Extending Hive

Topics : User-Defined Functions

F. Hands on Exercises – Playing with huge data and Querying extensively.

G. User defined Functions, Optimizing Queries, Tips and Tricks for performance tuning


A. Introduction to Pig

What Is Pig?,Pig’s Features,Pig Use Cases,Interacting with Pig

B. Basic Data Analysis with Pig

Pig Latin Syntax, Loading Data,Simple Data Types,Field Definitions,Data Output,Viewing the Schema,Filtering and Sorting Data,Commonly-Used Functions,Hands-On
Exercise: Using Pig for ETL Processing

C. Processing Complex Data with Pig

Complex/Nested Data Types,Grouping,Iterating Grouped Data,Hands-On Exercise: Analyzing Data with Pig

D.  Multi-Data set Operations with Pig

Techniques for Combining Data Sets,Joining Data Sets in Pig,Set Operations,Splitting Data Sets,Hands-On Exercise

E. Extending Pig

Macros and Imports,UDFs,Using Other Languages to Process Data with Pig,Hands-On Exercise: Extending Pig with Streaming and UDFs

F. Pig Jobs


A. Introduction to Impala

What is Impala?,How Impala Differs from Hive and Pig,How Impala Differs from Relational Databases,Limitations and Future Directions Using the Impala Shell

B. Choosing the best (Hive, Pig, Impala)

Major Project – Putting it all together and Connecting Dots

Putting it all together and Connecting Dots,Working with Large data sets, Steps involved in analyzing large data

ETL Connectivity with Hadoop Ecosystem

How ETL tools work in big data Industry,Connecting to HDFS from ETL tool and moving data from Local system to HDFS,Moving Data from DBMS to HDFS,Working with Hive with ETL Tool,Creating Map Reduce job in ETL tool,End to End ETL PoC showing Hadoop integration with ETL tool.

Job and Certification Support

Major Project, Hadoop Development, cloudera Certification Tips and Guidance and Mock Interview Preparation, Practical Development Tips and Techniques, certification preparation

DataStage Course Content

Information Server  

Introduction to the IBM Information Server Architecture, the Server Suite components, the various tiers in the Information Server.

InfoSphere DataStage           

Understanding the IBM InfoSphere DataStage, the Job life cycle to develop, test, deploy and run data jobs, high performance parallel framework, real-time data integration.

DataStage Features

Introduction to the design elements, various DataStage jobs, creating massively parallel framework, scalable ETL features, working with DataStage jobs.

DataStage Job           

Understanding the DataStage Job, creating a Job that can effectively extract, transform and load data, cleansing and formatting data to improve its quality.

Parallelism, Partitioning and Collecting        

Learning about data parallelism – pipeline parallelism and partitioning parallelism, the two types of data partitioning – Key-based partitioning and Keyless partitioning, detailed understanding of partitioning techniques like round robin, entire, hash key, range, DB2 partitioning, data collecting techniques and types like round robin, order, sorted merge and same collecting methods.

Job Stages of  InfoSphere DataStage               

Understanding the various job stages – data source, transformer, final database, the various parallel stages – general objects, debug and development stages, processing stage, file stage types, database stage, real time stage, restructure stage, data quality and sequence stages of InfoSphere DataStage.

Stage Editor               

Understanding the parallel job stage editors, the important types of stage editors in DataStage.

Sequential File         

Working with the Sequential file stages, understanding runtime column propagation, working with RCP in sequential file stages, using the sequential file stage as a source stage and target stage.

Dataset and Fileset

Understanding the difference between dataset and fileset and how DataStage works in each scenario.

Sample Job Creation

Creating of a sample DataStage job using the dataset and fileset types of data.

Properties of Sequential File stage and Data Set Stage       

Learning about the various properties of Sequential File Stage and Dataset stage.

Lookup File Set Stage

Creating a lookup file set, working in parallel or sequential stage, learning about single input and output link.

Transformer Stage

Studying the Transformer Stage in DataStage, the basic working of this stage, characteristics -single input, any number of outputs and reject link, how it differs from other processing stages, the significance of Transformer Editor, and evaluation sequence in this stage.

Transformer Stage Functions & Features

Deep dive into Transformer functions – String, type conversion, null handling, mathematical, utility functions, understanding the various features like constraint, system variables, conditional job aborting, Operators and Trigger Tab.

Looping Functionality

Understanding the looping functionality in Transformer Stage, output with multiple rows for single input row, the procedure for looping, loop variable properties.

Teradata Enterprise Stage

Connecting to the Teradata Enterprise Stage, properties of connection.

Single partition and parallel execution

Generating data using Row Generator sequentially in a single partition, configuring to run in parallel.

Aggregator Stage  

Understanding the Aggregator Stage in DataStage, the two types of aggregation – hash mode and sort mode.

Different Stages Of Processing

Deep learning of the various stages in DataStage, the importance of Copy, Filter and Modify stages to reduce number of Transformer Stages.

Parameters and Value File              

Understanding Parameter Set, storing DataStage and Quality Stage job parameters and default values in files, the procedure to deploy Parameter Sets function and its advantages.

Funnel Stage

Introduction to Funnel Stage, copying multiple input data sets into single output data set, the three modes – continuous funnel, sort funnel and sequence.

Join Stage

Topics – Understanding the Join Stage and its types, Join Stage Partitioning, performing various Join operations.

Lookup Stage

Understanding the Lookup Stage for processing using lookup operations, knowing when to use Lookup Stage, partitioning method for Lookup Stage, comparing normal and sparse lookup, doing lookup for a range of values using Range Lookup.

Merge Stage           

Learning about the Merge Stage, multiple input links and single output link, need for key partitioned and sorted input data set, specifying several reject links in Merge Stage, comparing the Join vs. Lookup vs. Merge Stages of processing.

FTP Enterprise Stage           

Studying the FTP Enterprise Stage, transferring multiple files in parallel, invoking the FTP client, transferring to or from remote host using FTP protocol, FTP Enterprise Stage properties.

Sort Stage 

Understanding the Sort Stage, performing complex sort operations, learning about Stable Sort, removing duplicates.

Teradata Connector             

Working with Teradata Connector in DataStage, configuring as a source, target or parallel in a lookup context for parallel or server jobs, learning about Teradata Parallel Transporter direct API for bulk operations and the Operators deployed.

Connector Stages  

Learning about the various Database Connector Stages for working with Balanced Optimization Tool.

ABAP Extract Stage               

Understanding the ABAP Extract Stage, extracting data from SAP data repositories, generating ABAP extraction programs, executing SQL query and sending data to DataStage Server.

Development / Debug Stages

The various Stages for debugging the parallel job designs, controlling flow of multiple activities in a job sequence, understanding the various data sampling stages in a Debug/Development Stage like Head Stage, Tail Stage and Sample Stage.

Job Activity Stage 

Learning about Job Activity Stage which specifies a DataStage Server or parallel job to execute.

Pentaho Course Content

Introduction to Pentaho Tool

Pentaho user console, Oveview of Pentaho Business Intelligence and Analytics tools, database dimensional modelling, using Star Schema for querying large data sets, understanding fact tables and dimensions tables, Snowflake Schema, principles of Slowly Changing Dimensions, knowledge of how high availability is supported for the DI server and BA server, managing Pentaho artifacts Knowledge of big data solution architectures

Hands-on Exercise – Schedule a report using user console, Create model using database dimensional modeling techniques, create a Star Schema for querying large data sets, Use fact tables and dimensions tables, manage Pentaho artifacts

Data Architecture

Designing data models for reporting, Pentaho support for predictive analytics, Design a Streamlined Data Refinery (SDR) solution for a client

Hands-on Exercise – Design data models for reporting, Perform predictive analytics on a data set, design a Streamlined Data Refinery (SDR) solution for a dummy client

Clustering in Pentaho

Understanding the basics of clustering in Pentaho Data Integration, creating a database connection, moving a CSV file input to table output and Microsoft Excel output, moving from Excel to data grid and log.

Hands-on Exercise – Create a database connection, move a csv file input to table output and Microsoft excel output, move data from excel to data grid and log

Data Transformation

The Pentaho Data Integration Transformation steps, adding sequence, understanding calculator, Penthao number range, string replace, selecting field value, sorting and splitting rows, string operation, unique row and value mapper, Usage of metadata injection

Hands-on Exercise – Practice various steps to perform data integration transformation, add sequence, use calculator, work on number range, selecting field value, sorting and splitting rows, string operation, unique row and value mapper, use metadata injection

Pentaho Flow

Working with secure socket command, Pentaho null value and error handling, Pentaho mail, row filter and priorities stream.

Hands-on Exercise – Work with secure socket command, Handle null values in the data, perform error handling, send email, get row filtered data, set stream priorities

Deploying SCD

Understanding Slowly Changing Dimensions, making ETL dynamic, dynamic transformation, creating folders, scripting, bulk loading, file management, working with Pentaho file transfer, Repository, XML, Utility and File encryption.

Hands-on Exercise – Make ETL dynamic transformation, create folders, write scripts, load bulk data, perform file management ops, work with Pentaho file transfer, XML utility and File encryption

Type of Repository in Pentaho

Creating dynamic ETL, passing variable and value from job to transformation, deploying parameter with transformation, importance of Repository in Pentaho, database connection, environmental variable and repository import.

Hands-on Exercise – Create dynamic ETL, pass variable and value from job to transformation, deploy parameter with transformation, connect to a database, set pentaho environmental variables, import a repository in the pentaho workspace

Pentaho Repository & Report Designing

Working with Pentaho dashboard and Report, effect of row bending, designing a report, working with Pentaho Server, creation of line, bar and pie chart in Pentaho, How to achieve localization in reports

Hands-on Exercise – Create Pentaho dashboard and report, check effect of row bending, design a report, work with Pentaho Server, create line, bar and pie chart in Pentaho, Implement localization in a report

Pentaho Dashboard

Working with Pentaho Dashboard, passing parameters in Report and Dashboard, drill-down of Report, deploying Cubes for report creation, working with Excel sheet, Pentaho data integration for report creation.

Hands-on Exercise – Pass parameters in Report and Dashboard, deploy Cubes for report creation, drill-down in report to understand the entries, import data from an excel sheet, Perform data integration for report creation

Understanding Cube

What is a Cube? Creation and benefit of Cube, working with Cube, Report and Dashboard creation with Cube.

Hands-on Exercise – Create a Cube, create report and dashboard with Cube

Multi Dimensional Expression

Understanding the basics of Multi Dimensional Expression (MDX), basics of MDX, understanding Tuple, its implicit dimensions, MDX sets, level, members, dimensions referencing, hierarchical navigation, and meta data.

Hands-on Exercise – Work with MDX, Use MDX sets, level, members, dimensions referencing, hierarchical navigation, and meta data

Pentaho Analyzer

Pentaho analytics for discovering, blending various data types and sizes, including advanced analytics for visualizing data across multiple dimensions, extending Analyzer functionality, embedding BA server reports, Pentaho REST APIs

Hands-on Exercise – Blend various data types and sizes, Perform advanced analytics for visualizing data across multiple dimensions, Embed BA server report

Pentaho Data Integration (PDI) Development

Knowledge of the PDI steps used to create an ETL job, Describing the PDI / Kettle steps to create an ETL transformation, Describing the use of property files

Hands-on Exercise – Create an ETL transformation using PDI / Kettle steps, Use property files

Hadoop ETL Connectivity

Deploying ETL capabilities for working on the Hadoop ecosystem, integrating with HDFS and moving data from local file to distributed file system, deploying Apache Hive, designing MapReduce jobs, complete Hadoop integration with ETL tool.

Hands-on Exercise – Deploy ETL capabilities for working on the Hadoop ecosystem, Integrate with HDFS and move data from local file to distributed file system, deploy Apache Hive, design MapReduce jobs

Creating dashboards in Pentaho

Creating interactive dashboards for visualizing highly graphical representation of data for improving key business performance.

Hands-on Exercise – Create interactive dashboards for visualizing graphical representation of data

Performance Tuning

Managing BA server logging, tuning Pentaho reports, monitoring the performance of a job or a transformation, Auditing in Pentaho

Hands-on Exercise – Manage logging in BA server, Fine tune Pentaho report, Monitor the performance of an ETL job


Integrating user security with other enterprise systems, Extending BA server content security, Securing data, Pentaho’s support for multi-tenancy, Using Kerberos with Pentaho

Hands-on Exercise – Configure security settings to implement high level security

Informatica Projects

Project 1 : General Manager Insight – Dash Board

Client : Cisco

Technology : Teradata, Informatica, SQL

Cisco is the worldwide leader in networking that transforms how people connect, communicate and collaborate. Current portfolio of products and services is focused upon three market segments—Enterprise and Service Provider, Small Business and the Home. The solutions for each market are segmented into Architectures, which form the basis for how Cisco approaches each market

Description : GMI dashboard is graphical representation performance of various Cisco Business segments. Dashboard provides pictorial representation of Summery and detailed data about various Subjects Areas in different dimensions such as Entity level, Product level and Region level. Dashboard covers various measures of the business such as Bookings, Revenue and Gross Margin; Product forecast which are used by managers at different hierarchies of the Cisco management.

Project 2 : Deploying Informatica ETL for business intelligence

Industry : General

Problem Statement : Disparate data needs to be converted into insights using Informatica

Topics : In this Informatica project you have access to all environments like dev, QA, UAT and production. You will first configure all the repositories in various environment. You will receive the requirement from client through source to target mapping sheet. You will extract data from various source systems and fetch it into staging. From staging it will go to the operational data store, and from there the data will go to the enterprise data warehouse and from there it will be directly deployed for generating reports and deriving business insights.

Highlights :

  • Access data from multiple sources
  • Manage current & historic data with SCD
  • Import source & target tables

Case Study – 1

Project: Banking products augmentation

Industry: Banking

Problem Statement: How to improve the profits of a bank by customizing the products and adding new products based on customer needs.

Topics: In this Informatica project you will construct a multidimensional model for the bank. You will create a set of diagrams depicting the star-join schemas needed to streamline the products as per customer requirements. You will implement slowly changing dimensions, understand the customer/account relationship and create diagram for description of the hierarchies. You will also recommend heterogeneous products for the customers of the bank.


  • Deploy Star join schema
  • Create demographic mini-dimensions
  • Informatica Aggregator Transformations

Case Study – 2

Project: Employee data integration

Industry: General

Problem Statement: How to load a table with employee data using Informatica

Topics: In this Informatica case study you will create a design for a common framework that can be used for loading, updating the employee ID and other details lookup for multiple shared tables. Your design will address the regular loading of shared table. You will also keep track of when the regular load runs, when the lookup requests run, prioritization of requests if needed and so on.


  • Creating multiple shared table
  • Plug-and-play capability of framework
  • Code and framework reusability.

SSIS Project

Project 1 : Configuration and Logging

Industry : General

Problem Statement :  How to integrate data from multiple sources into the SQL Server

Topics : In this SQL Server Integration Services (SSIS) project you will extensively work on multiple data from heterogeneous source into SQL Server. As part of the project you will learn to clean and standardize data and automate the administrative work. Some of the tasks that you will be performing are adding logs to SSIS package, configuration and saving it to an XML file. Upon completion of the project you will have hands-on experience in handling constraints, error row configuration and event handlers.

Highlight :

  • Integrate data from heterogeneous sources
  • Working with Connection Manager
  • Deploying data modeling

OBIEE Project

Project :  Report formatting using OBIEE

Industry : General

Problem Statement : How to find the revenue generated for a business

Topics : This is an Oracle Business Intelligence project that is associated with creating complex dashboards and performing formatting of the report. You will gain hands-on experience in filtering and sorting of the report data depending on the business requirements. This project will also help you understand how to convert the data into graphs for easy visualization and analysis. As part of the project you will gain experience in calculating the subtotal and grand total in a business scenario while finding the revenue generated.

Highlights :

  • Filtering and sorting of Reports
  • Deploying visualization & analysis techniques
  • Designing an OBIEE dashboard.

Talend For Hadoop Project

Project Work

1. Project – Jobs

Problem Statement – It describes that how to create a job using metadata. For this it includes following actions:

Create XML File,Create Delimited File,Create Excel File,Create Database Connection

2. Hadoop Projects

A. Project – Working with Map Reduce, Hive, Sqoop

Problem Statement – It describes that how to import mysql data using sqoop and querying it using hive and also describes that how to run the word count mapreduce job.

B. Project – Connecting Pentaho with Hadoop Eco-system

Problem Statement – It includes:

Quick Overview of ETL and BI,Configuring Pentaho to work with Hadoop Distribution,Loading data into Hadoop cluster,Transforming data into Hadoop cluster
Extracting data from Hadoop Cluster

Data Stage Projects

Project 1 :  Making sense of financial data

Industry :  Financial Services

Problem Statement : Extract value from multiple sources & varieties of data in the financial domain

Description : In this project you will learn how to work with disparate data in the financial services domain and come up with valuable business insights. You will deploy IBM InfoSphere DataStage for the entire Extract, Transform, Load process to leverage it for a parallel framework either on-premise or on the cloud for high performance results. You will work on big data at rest and big data in motion as well.

Highlights :

  • Creating DataStage jobs for ETL process
  • Deploying DataStage Parallel Stage Editor
  • Data Partitioning for getting consistent results

Project 2 : Enterprise IT data management

Industry :  Information Technology

Problem Statement :  Software enterprises have a lot of data and this needs to made sense of in order to derive valuable insights from it

Description : This project involves working with the data warehouse existing in a company deploying the IBM DataStage onto it for the various processes of extract, transform, and load. You will learn how DataStage manages high performance parallel computing. You will learn how it implements extended metadata management and enterprise connectivity. This also includes combining heterogeneous data.

Highlights :

  • Enforce workload & business rules
  • DataStage deployed on heterogeneous data
  • Integrating real-time data at scale.

Project 3 : Medical drug discovery and development

Industry :  Pharmaceutical

Problem Statement :  A pharmaceutical company wants to speed the process of drug discovery and development through using ETL solutions.

Description :  This project deals with the domain of drug molecule discovery and development. You will learn how DataStage helps to make sense of the huge data warehouse that resides within the pharmaceutical domain which includes data about patient history, existing molecules, and the effect of the existing drugs and so on. The ETL tool DataStage will help to make the process of drug discovery that much easier.

Highlights :

  • Combining various types of data with ETL process
  • Converting the data and transferring it for analysis
  • Making the data ready for visualization & insights.

Project 4 :  Finding the oil reserves in ocean

Industry :  Oil and Gas

Problem Statement :  Finding new oil reserves is a very herculean task. There are huge amounts of data that need to be parsed in order to find where oil exists in the ocean. This is where there is a need for an ETL tool like DataStage.

Description :  This project deals with the process of deploying ETL tool like Datastage to parse petabytes of data for discovering new oil. This data could be in the form of geological data, sensor data, streaming data and so. You will learn how DataStage can make sense of all this data.

Highlights :

  • Working with cloud or on-premise data
  • Deploying DataStage for static or streaming data
  • Converting data into the right format for analysis

Pentaho Projects

Project 1– Pentaho Interactive Report

Data– Sales, Customer, Product

Objective – In this Pentaho project you will be exclusively working on creating Pentaho interactive reports for sales, customer and product data fields. As part of the project you will learn to create a data source, build a Mondrian cube which is represented in an XML file. You will gain advanced experience in managing data sources, building and formatting Pentaho report, change the report template and scheduling of reports.

Project 2– Pentaho Interactive Report

Domain– Retail

Objective – Build complex dashboard with drill down reports and charts for analysing business trends.

Project 3– Pentaho Interactive Report

Domain– BI

Objective – To do automation testing in ETL environment, Check the correctness of data transformation, Data loading in datawarehouse without any loss or truncation, Rejecting, Replacing and Reporting invalid data, Creation of unit tests to target exceptions

view more
Read Less

Sample ETL Tools Video Tutorials

view more
View Less Sample Videos

ETL Tools Certification

This training course is designed for clearing the following exams:

  • Power Center Developer Certification
  • Oracle Business Intelligence Foundation Essentials Exam
  • Talend Data Integration Certified Developer Exam
  • IBM Certified Solution Developer – InfoSphere DataStage
  • Pentaho Business Analytics Implementation
  • Cloudera Spark and Hadoop Developer Certification (CCA175) Exam

The entire course content is in line with respective certification programs and helps you clear the requisite certification exams with ease and get the best jobs in top MNCs.

As part of this training, you will be working on real-time projects and assignments that have immense implications in the real-world industry scenarios, thus helping you fast-track your career effortlessly.

At the end of this training program, there will be quizzes that perfectly reflect the type of questions asked in the respective certification exams and help you score better marks.

Intellipaat Course Completion Certificate will be awarded on the completion of the project work (after the expert review) and upon scoring at least 60% marks in the quiz. Intellipaat certification is well recognized in top 80+ MNCs like Ericsson, Cisco, Cognizant, Sony, Mu Sigma, Saint-Gobain, Standard Chartered, TCS, Genpact, Hexaware, etc.

view more
Read Less Certification

Frequently Asked Questions on ETL Tools

Why should I learn ETL Tools from Intellipaat?

Intellipaat is a market leader in the ETL tools training. Today ETL tools are increasingly used in business scenarios in order to efficiently derive insights from huge amounts of disparate data. The extract-transform-load process is pretty standard when it comes to getting data from diverse databases, cleansing, filtering, transforming and finally deploying the data into the destination database.

This training includes some of the most powerful and efficient ETL tools like Informatica, SSIS, OBIEE, Talend, DataStage and Pentaho. The entire course content of this combo training is created toward helping you clear multiple certifications exams, viz., Power Center Developer Certification, Oracle Business Intelligence Foundation Essentials Exam, Talend Data Integration Certified Developer Exam, IBM Certified Solution Developer – InfoSphere DataStage, Pentaho Business Analytics Implementation and Cloudera Spark and Hadoop Developer Certification (CCA175) Exam.

This is a completely career-oriented training designed by industry experts. Your training program includes real-time projects and step-by-step assignments to evaluate your progress and specifically designed quizzes for clearing the requisite certification exams.

Intellipaat also offers lifetime access to videos, course materials, 24/7 support and course material upgrades to the latest version at no extra fee. Hence, it is clearly a one-time investment.

What are the different modes of training that Intellipaat provides?
At Intellipaat you can enroll either for the instructor-led online training or self-paced training. Apart from this Intellipaat also offers corporate training for organizations to upskill their workforce. All trainers at Intellipaat have 12+ years of relevant industry experience and they have been actively working as consultants in the same domain making them subject matter experts. Go through the sample videos to check the quality of the trainers.
Can I request for a support session if I need to better understand the topics?
Intellipaat is offering the 24/7 query resolution and you can raise a ticket with the dedicated support team anytime. You can avail the email support for all your queries. In the event of your query not getting resolved through email we can also arrange one-to-one sessions with the trainers. You would be glad to know that you can contact Intellipaat support even after completion of the training. We also do not put a limit on the number of tickets you can raise when it comes to query resolution and doubt clearance.
Can you explain the benefits of the Intellipaat self-paced training?
Intellipaat offers the self-paced training to those who want to learn at their own pace. This training also affords you the benefit of query resolution through email, one-on-one sessions with trainers, round the clock support and access to the learning modules or LMS for lifetime. Also you get the latest version of the course material at no added cost. The Intellipaat self-paced training is 75% lesser priced compared to the online instructor-led training. If you face any problems while learning we can always arrange a virtual live class with the trainers as well.
What kind of projects are included as part of the training?
Intellipaat is offering you the most updated, relevant and high value real-world projects as part of the training program. This way you can implement the learning that you have acquired in a real-world industry setup. All training comes with multiple projects that thoroughly test your skills, learning and practical knowledge thus making you completely industry-ready. You will work on highly exciting projects in the domains of high technology, ecommerce, marketing, sales, networking, banking, insurance, etc. Upon successful completion of the projects your skills will be considered equal to six months of rigorous industry experience.
Does Intellipaat offer job assistance?
Intellipaat actively provides placement assistance to all learners who have successfully completed the training. For this we are exclusively tied-up with over 80 top MNCs from around the world. This way you can be placed in outstanding organizations like Sony, Ericsson, TCS, Mu Sigma, Standard Chartered, Cognizant, Cisco, among other equally great enterprises. We also help you with the job interview and résumé preparation part as well.
Is it possible to switch from self-paced training to instructor-led training?
You can definitely make the switch from self-paced to online instructor-led training by simply paying the extra amount and joining the next batch of the training which shall be notified to you specifically.
How are Intellipaat verified certificates awarded?
Once you complete the Intellipaat training program along with all the real-world projects, quizzes and assignments and upon scoring at least 60% marks in the qualifying exam; you will be awarded the Intellipaat verified certification. This certificate is very well recognized in Intellipaat affiliate organizations which include over 80 top MNCs from around the world which are also part of the Fortune 500 list of companies.
Will The Job Assistance Program Guarantee Me A Job?
In our Job Assistance program we will be helping you land in your dream job by sharing your resume to potential recruiters and assisting you with resume building, preparing you for interview questions. Intellipaat training should not be regarded either as a job placement service or as a guarantee for employment as the entire employment process will take part between the learner and the recruiter companies directly and the final selection is always dependent on the recruiter.
view more
Read Less FAQ
Lifetime Access and 24/7 Support
You have of $0 in your cart.
Drop Us a Query

Call Us

Training in Cities: Bangalore, Hyderabad, Chennai, Delhi, Kolkata, UK, London, Chicago, San Francisco, Dallas, Washington, New York, Orlando, Boston

Select Currency

Sign Up or Login to view the Free ETL Tools Training course.