Courses
Browse

ETL Tools Training

Master Program

Our ETL tools master’s program will let you gain proficiency in top ETL tools like Informatica, SSIS, OBIEE, Talend, DataStage and Pentaho. You will work on real-world projects in data warehousing, data integration, Hadoop connectivity, data modelling, SCD and data schema.

  • 6+

    Courses

  • 11+

    Projects

  • 146

    Hours

  • Self Paced Training

    • Informatica
    • SSIS
    • OBIEE
    • Talend
    • DataStage
    • Pentaho

Key Features

146 Hrs Self-paced Videos
292 Hrs Project work & Exercises
Certification and Job Assistance
Flexible Schedule
Lifetime Free Upgrade
24 x 7 Lifetime Support & Access

Course Fees

Self Paced Training

  • 146 Hrs e-learning videos
  • Lifetime Free Upgrade
  • 24 x 7 Lifetime Support & Access
$1,099
$0

Corporate Training

  • Customized Learning
  • Enterprise grade learning management system (LMS)
  • 24x7 support
  • Strong Reporting

About ETL Tools Training Course

This Intellipaat training in ETL tools will give you a powerful head start when it comes to extract, transform and load processes that exclusively cater to the Business Intelligence domain. This all-in-one course in ETL includes six most powerful ETL tools, and upon the completion of the training you will be a certified ETL professional.

List of Courses Included

  • Informatica
  • SSIS
  • OBIEE
  • Talend
  • DataStage
  • Pentaho
  1. Introduction to ETL and its importance in data warehousing
  2. Setting up and installation of various ETL tools
  3. Optimizing ETL tools for best results based on specific job requirements
  4. OLAP, ETL and OLTP systems
  5. Data modeling and dimensional modeling
  6. Working with star and snowflake schemas
  7. Dimensions, measures and fact tables in data warehousing
  8. Data types and performance tuning in ETL
  9. SCD types and various methods of handling SCD data
  10. Various types of data transformation techniques
  11. Source qualifier transformation and mappings
  • ETL and SQL Developers and Business Intelligence Professionals
  • Database, Data Warehousing and Mainframe Professionals and Project Managers

Anybody can take up this training course. However, having a basic knowledge of SQL can be helpful.

The process of ETL is of absolute importance in any data warehousing and Business Intelligence scenario. Getting the ETL process right has a direct bearing on the type of data that will be loaded in the data warehouse, and this directly affects the quality of Business Intelligence that is derived and finally the business insights that are reached at. This Intellipaat training is your one stop for mastering some of the best ETL tools available in the market today. Upon the completion of this course, you can command best salaries in the ETL, data warehousing and Business Intelligence domains in top MNCs around the world.

View More

Talk to us

ETL Tools Training Review

John Chioles

Ritesh Bhagwat

Mr Yoga

Dileep & Ajay

Sagar

Ashok Guntupalli

Purvi Narang

Big Data Expert at Wipro

The training was very informative and helped me a lot in understanding ETL tools. I had done this training in a day and that too in detail. The course contents include basic knowledge to advanced applications in detail. This Course surely helped me enhancing my knowledge.

Ruchita Vijay

BI Team Lead & Architect at MakeMyTrip

Complex terms are explained in a very elegant way. The trainer they are providing is very knowledgeable. The trainer makes sure that you are enjoying this course while learning. Intellipaat provided very interactive sessions. Thank you very much for presenting such valuable information here. It really helped me in understanding ETL tools.

ETL Tools Course Content

Data Warehousing and Cleansing Concepts

What is data warehousing, understanding the extract, transform and load processes, what is data aggregation, data scrubbing and data cleansing and the importance of Informatica PowerCenter ETL

Informatica Installation and Configuration

Configuring the Informatica tool and how to install the Informatica operational administration activities and integration services

Hands-on Exercise: Step-by-step process for the installation of Informatica PowerCenter

Working with Active and Passive Transformatio

Understanding the difference between active and passive transformations and the highlights of each transformation

Working with Expression Transformation

Learning about expression transformation and connected passive transformation to calculate value on a single row

Hands-on Exercise: Calculate value on a single row using connected passive transformation

Working with Sorter, Sequence Generator and Filter Transformation

Different types of transformations like sorter, sequence generator and filter, the characteristics of each and where they are used

Hands-on Exercise: Transform data using the filter technique, use a sequence generator and use a sorter

Working with Joiner Transformation

Working with joiner transformation to bring data from heterogeneous data sources

Hands-on Exercise: Use joiner transformation to bring data from heterogeneous data sources

Working with Ranking and Union Transformation

Understanding the ranking and union transformation, the characteristics and deployment

Hands-on Exercise: Perform ranking and union transformation

Syntax for Rank and Dense Rank

Learn the rank and dense rank functions and the syntax for them

Hands-on Exercise: Perform rank and dense rank functions

Router Transformation

Understanding how router transformation works and its key features

Hands-on Exercise: Perform router transformation

Source Qualifier Transformation and Mappings

Lookup transformation overview and different types of lookup transformations: connected, unconnected, dynamic and static

Hands-on Exercise: Perform lookup transformations: connected, unconnected, dynamic and static

Slowly Changing Dimension in Informatica

What is SCD, processing in xml, learn how to handle a flat file, list and define various transformations, implement ‘for loop’ in PowerCenter, the concepts of pushdown optimization and partitioning, what is constraint-based loading and what is incremental aggregation

Hands-on Exercise: Load data from a flat file, implement ‘for loop’ in PowerCenter, use pushdown optimization and partitioning, do constraint-based data loading and use incremental aggregation technique to aggregate data

Mapplet and Loading to Multiple Designer

Different types of designers: Mapplet and Worklet, target load plan, loading to multiple targets and linking property

Hands-on Exercise: Create a mapplet and a worklet, plan a target load and load multiple targets

Performance Tuning in Informatica

Objectives of performance tuning, defining performance tuning and learning the sequence for tuning

Hands-on Exercise: Do performance tuning by following different techniques

Repository Manager

Managing repository, Repository Manager: the client tool, functionalities of previous versions and important tasks in Repository Manager

Hands-on Exercise: Manage tasks in Repository Manager

Best Practices in Informatica

Understanding and adopting best practices for managing repository

Workflow Informatica

Common tasks in workflow manager, creating dependencies and the scope of workflow monitor

Hands-on Exercise: Create workflow with dependencies of nodes

Parameters and Variables

Define the variable and parameter in Informatica, parameter files and their scope, the parameter of mapping, worklet and session parameters, workflow and service variables and basic development errors

Hands-on Exercise: Define variables and parameters in functions, use the parameter of mapping, use worklet and session parameters and use workflow and service variables

Error Handling and Recovery in Informatica

Session and workflow log, using debuggers, error-handling framework in Informatica and failover and high availability in Informatica

Hands-on Exercise: Debug development errors, read workflow logs and use the error-handling framework

High Availability and Failover in Informatica

Configurations and mechanisms in recovery and checking health of PowerCenter environment

Hands-on Exercise: Configure recovery options and check health of PowerCenter environment

Working with Different Utilities in Informatica

Using commands: infacmd, pmrep and infasetup and processing of a flat file

Hands-on Exercise: Use commands: infacmd, pmrep and infasetup

Flat File Processing (Advanced Transformations)

Fixed length and delimited, expression transformations: sequence numbers and dynamic targeting using transaction control

Hands-on Exercise: Perform expression transformations: sequence numbers and dynamic targeting using transaction control

Dynamic Targeting

Dynamic target with the use of transaction control and indirect loading

Hands-on Exercise: Use of transaction control with dynamic target and indirect loading

Working with Java Transformations

Importance of Java transformations to extend PowerCenter capabilities, transforming data and active and passive mode

Hands-on Exercise: Use Java transformations to extend PowerCenter capabilities

Unconnected Stored Procedure Usage

Understanding the unconnected stored procedure in Informatica and different scenarios of unconnected stored procedure usage

Hands-on Exercise: Use the unconnected stored procedure in Informatica in different scenarios

Advanced Concepts in SCD

Using SQL transformation (active and passive)

Hands-on Exercise: Use SQL transformation (active and passive)

Incremental Data Loading and Aggregation

Understanding incremental loading and aggregation and comparison between them

Hands-on Exercise: Do incremental loading and aggregation

Constraint-based Loading

Working with database constraints using PowerCenter and understanding constraint-based loading and target load order

Hands-on Exercise: Perform constraint-based loading in a given order

XML Transformation and Active Lookup

Various types of XML transformation in Informatica and configuring a lookup as active

Hands-on Exercise: Perform XML transformation and configure a lookup as active

Profiling in PowerCenter

Understanding what data profiling in Informatica is, its significance in validating content and ensuring quality and structure of data as per business requirements

Hands-on Exercise: Create data profiling in Informatica and validate the content

Workflow Creation and Deletion

Understanding workflow as a group of instructions/commands for integration services and learning how to create and delete workflow in Informatica

Hands-on Exercise: Create and delete workflow in Informatica

Database Connection

Understanding the database connection, creating a new database connection in Informatica and understanding various steps involved

Hands-on Exercise: Create a new database connection in Informatica

Relational Database Tables

Working with relational database tables in Informatica, mapping for loading data from flat files to relational database files

Hands-on Exercise: Create mapping for loading data from flat files to relational database files

LinkedIn Connection

Understanding how to deploy PowerCenter for seamless LinkedIn connectivity with Informatica PowerCenter

Hands-on Exercise: Deploy PowerCenter for seamless LinkedIn connectivity with Informatica PowerCenter

Connection with Sources

Connecting Informatica PowerCenter with various data sources like social media channels such as Facebook, Twitter, etc.

Hands-on Exercise: Connect Informatica PowerCenter with various data sources like social media channels such as Facebook, Twitter, etc.

Pushdown Optimization and Partitioning

Pushdown optimization for load-balancing on the server for better performance and various types of partitioning for optimizing performance

Hands-on Exercise: Optimize using pushdown technique for load-balancing on the server for better performance and create various types of partitioning for optimizing performance

Cache Management

Understanding session cache, the importance of cache creation, implementing session cache and calculating cache requirement

Hands-on Exercise: Implement cache creation and work with session cache

What projects I will be working on this Informatica training?

Project 1: Admin Console

Problem Statement:It includes following actions:

  • Creation of users
  • Building roles
  • Forming groups
  • Collaboration of users, roles and groups
  • Lock handling
  • Creating sessions, workflow and worklets

Project 2: Deploying Informatica ETL for Business Intelligence

Industry:General

Problem Statement: Disparate data needs to be converted into insights using Informatica

Topics: In this Informatica project, you have access to all environments like dev, QA, UAT and production. You will first configure all the repositories in various environments. You will receive the requirement from client through source to target mapping sheet. You will extract data from various source systems and fetch it into staging. From staging, it will go to the operational data store; from there, the data will go to the enterprise data warehouse, and from there it will be directly deployed for generating reports and deriving business insights.

Highlights:

  • Access data from multiple sources
  • Manage current and historic data with SCD
  • Import source and target tables

Project 3: Deploying the ETL Transactions on Healthcare Data

Industry:Healthcare

Problem Statement: How to systematically load data within a hospital scenario so that it is easily available

Topics: In this Clinical Research Data Warehouse (CRDW) Informatica project, you will be working on various types of data coming from diverse sources. The warehouse contains remitted claims that are both approved or disapproved for end-user reporting. You will create CRDW load schedules that are on daily, weekly and monthly bases.

Highlights:

  • Extracting data from multiple sources
  • Cleansing data and putting in right format
  • Loading the data into the CRDW

Case Study 1

Project:Banking Products Augmentation

Industry: Banking

Problem Statement: How to improve the profits of a bank by customizing the products and adding new products based on customer needs

Topics:In this Informatica case study, you will construct a multidimensional model for the bank. You will create a set of diagrams depicting the star-join schemas needed to streamline the products as per customer requirements. You will implement slowly changing dimensions, understand the customer–account relationships and create diagram for the description of the hierarchies. You will also recommend heterogeneous products for the customers of the bank.

Highlights:

  • Deploy a star-join schema
  • Create demographic mini-dimensions
  • Informatica aggregator transformations

Case Study 2

Project:Employee Data Integration

Industry:General

Problem Statement:How to load a table with employee data using Informatica

Topics:In this Informatica case study, you will create a design for a common framework that can be used for loading and updating the employee ID and other details lookup for multiple shared tables. Your design will address the regular loading of shared tables. You will also keep a track of when the regular load runs, when the lookup requests run, prioritization of requests if needed and so on.

Highlights:

  • Creating multiple shared tables
  • Plug-and-play capability of the framework
  • Code and framework reusability

What is BI?

Introduction to Business Intelligence, understanding the concept of Data Modeling, Data Cleaning, learning about Data Analysis, Data Representation, Data Transformation.

ETL Overview

Introduction to ETL, the various steps involved Extract, Transform, Load, using a user’s email ID to read a flat file, extracting the User ID from email ID, loading the data into a database table.

Working with Connection Managers

Introduction to Connection Managers – logical representation of a connection, the various types of Connection Managers – Flat file, database, understanding how to load faster with OLE DB, comparing the performance of OLE DB and ADO.net, learning about Bulk Insert, working with Excel Connection Managers and identifying the problems.

Data Transformations

Learning what is Data Transformation, converting data from one format to another, understanding the concepts of Character Map, Data Column and Copy Column Transformation, import and export column transformation, script and OLEDB Command Transformation, understanding row sampling, aggregate and sort transformation, percentage and row sampling.

Advance Data Transformation

Understanding Pivot and UnPivot Transformation, understanding Audit and Row Count Transformation, working with Split and Join Transformation, studying Lookup and Cache Transformation, Integrating with Azure Analysis Services, elastic nature of MSBI to integrate with the Azure cloud service, scale out deployment option for MSBI, working with cloud-borne data sources and query analysis. Scaling out the SSIS package, deploying for tighter windows, working with larger amount of data sources, SQL Server vNext for enhancing SQL Server features, more choice of development languages and data types both on-premise and in the cloud.

Slowly Changing Dimensions

Understanding data that slowly changes over time, learning the process of how new data is written over old data, best practices.Detail explanation of three types of SCDs –Type1, Type2 and Type3, and their differences.

Overview of Fuzzy Look-up Transformation and Lookup and Term Extraction

Understanding how Fuzzy Lookup Transformation varies from Lookup Transformation, the concept of Fuzzy matching,

Concepts of Logging & Configuration

Learning about error rows configuration, package logging, defining package configuration, understanding constraints and event handlers.

What projects I will be working on this SSIS training?

Project 1:  SSIS

Problem Statement: Create a data flow task to extract data from the XLS format and store it into the SQL database, store the subcategory and category-wise sales in a table of the database. Once you get the output, split the dataset into two other tables. Table 1 should contain three columns (Sales < 100,000), Category, and Subcategory. Table 2 should contain (Sales > 100,000), Subcategory, and Category columns. Also, the Sales column should be sorted in both tables. Divide the whole dataset into a ratio of 70:30 percent and store the results in two different tables in the database

Topics: Data FlowODBC Set up and Connection Manager, Flat File Connection, Transformation, Import Export Transformation, Split and Join Transformation, Merge and Union All Transformation

Highlights:

  • Creating a Data Flow Task
  • ODBC Set up and Connection Manager
  • Transformations

Case Study: SSIS

Problem Statement: Create the connection of OLDB & load the data in SQL Server from excel; Create transformation where you have to split the people’s age group; How to create constants and events in package; Create a project level and package parameter at the package level; How to extract the data in an Incremental Order;

Topics: Data Flow, ODBC Set up and Connection Manager, Transformation, Split & Join Transformation, Term Extraction and Lookup

Highlights:

  • ODBC Set up and Connection Manager
  • Transformations

Data Modeling Concepts

Introduction to OBIEE,installation of OBIEE,What are data models and why you need them? The scope, reach and benefits of data modeling,data warehousing,sample OBIEE Report, the business requirement intrinsic in data modeling, various case studies, the data modeling implications and the impact of data modeling on business intelligence.

Business Intelligence Concepts

Introduction to Business Intelligence, the architecture of data flow,OBIEE architecture, stack description of BI technology,BI Server,BI Scheduler,displaying report with data,need for reporting in business, distinction between OLTP and OLAP, the BI platform in BI technology stack, the product and dimension hierarchy, multidimensional and relational analytical processing, types of Reports, multidimensional modelling.

Understanding OLAP

Online Analytical Processing,the OBIEE admin tools, RPD, the important concepts & terminology, significance of OLAP in business intelligence life cycle, understanding various data schemas like star,designing with Star Schema,creation of physical layer & simple RPD, enterprise information model, snow flake and constellation, aggregate and calculated measures.

Oracle Business Intelligence Suite

Introduction to Oracle Business Intelligence Enterprise Edition, overview of the OBIEE product, the Architecture of OBIEE, key features and components,creating a simple report, business model, hierarchy, presentation and mapping.

Oracle BI Repository

Understanding what is Oracle Business Intelligence Repository, installation of OBIEE on Windows system, directory structure installation, services,, analytics and interactive reporting, dashboard creation, multiple report creation, formula editing, column properties, altering

BI Repository Business Model

Understanding how to build a Business Model and Mapping Layer in BI Repository, creating the Presentation Layer,formatting of data, conditional formatting, saving the report, creating and sharing folder. Topics – Data format, Conditional format, Removing filters,Like,Advanced ,Save the report , Shared folder and my folder, Creating new folder

Business Model Continued

Working with the Enterprise Manager, testing and validating the Repository, cache disabling,dashboard prompt, filtering, editing of dashboard with action link.Water fall model.

Working with Repository

Working with the Repository, creating Test Report, adding calculations, deploying OBIEE analysis, coming up with landing page UI and its features,repository variables, session and presentation variables.

BI Presentation Catalog

Learning about the Oracle BI Presentation Catalog, accessing and managing objects, Report archiving and exporting, data grouping and limiting in analyses, data formatting, conditional formatting, master detail report, report creation with multiple subject areas, data mashup, visual analyzer, performance tile, BI functionality, waterfall model, graphs, pivot table, Pie chart, KPI watchList.

Dashboard Creation

The OBIEE dashboard setup, basics of dashboard and dashboard pages, deploying Dashboard Builder for building Dashboards, editing, sharing, and saving Dashboard analysis,cache creation & clearing, ODBC functions in OBIEE, Logical Table Source, summary & detail report.

OBIEE Security & Management

Securing the Oracle Business Intelligence Suite with Enterprise Manager, creating alerts, managing grouping and maintenance,administrating, the various types of security in OBIEE, object, task and folder level security, Report scheduling.

What projects I will be working on this OBIEE training?

Project :  Report formatting using OBIEE

Industry : General

Problem Statement : How to find the revenue generated for a business

Topics : This is an Oracle Business Intelligence project that is associated with creating complex dashboards and performing formatting of the report. You will gain hands-on experience in filtering and sorting of the report data depending on the business requirements. This project will also help you understand how to convert the data into graphs for easy visualization and analysis. As part of the project you will gain experience in calculating the subtotal and grand total in a business scenario while finding the revenue generated.

Highlights :

  • Filtering and sorting of Reports
  • Deploying visualization & analysis techniques
  • Designing an OBIEE dashboard.

Getting started with Talend

Working of Talend,Introduction to Talend Open Studio and its Usability,What is Meta Data?

Jobs

Creating a new Job,Concept and creation of Delimited file,Using Meta Data and its Significance,What is propagation?,Data integration schema,Creating Jobs using t-filter row and string filter,Input delimation file creation

Overview of Schema and Aggregation

Job design and its features,What is a T map?,Data Aggregation,Introduction to triplicate and its Working,Significance and working of tlog,T map and its properties.

Connectivity with Data Source

Extracting data from the source,Source and Target in Database (MySQL),Creating a connection, Importing Schema or Metadata

Getting started with Routines/Functions

Calling and using Functions,What are Routines?,Use of XML file in Talend,Working of Format data functions,What is type casting?

Data Transformation

Defining Context variable,Learning Parameterization in ETL,Writing an example using trow generator,Define and Implement Sorting,What is Aggregator?,Using t flow for publishing data,Running Job in a loop.

Connectivity with Hadoop

Learn to start Trish Server,Connectivity of ETL tool connect with Hadoop,Define ETL method,Implementation of Hive,Data Import into Hive with an example,An example of Partitioning in hive,Reason behind no customer table overwriting?,Component of ETL,Hive vs. Pig,Data Loading using demo customer,ETL Tool,Parallel Data Execution.

Introduction to Hadoop and its Ecosystem, Map Reduce and HDFS

Big Data, Factors constituting Big Data,Hadoop and Hadoop Ecosystem,Map Reduce -Concepts of Map, Reduce, Ordering, Concurrency, Shuffle, Reducing, Concurrency ,Hadoop Distributed File System (HDFS) Concepts and its Importance,Deep Dive in Map Reduce – Execution Framework, Partitioner Combiner, Data Types, Key pairs,HDFS Deep Dive – Architecture, Data Replication, Name Node, Data Node, Data Flow, Parallel Copying with DISTCP, Hadoop Archives

Hands on Exercises

Installing Hadoop in Pseudo Distributed Mode, Understanding Important configuration files, their Properties and Demon Threads,Accessing HDFS from Command Line

Map Reduce – Basic Exercises,Understanding Hadoop Eco-system,Introduction to Sqoop, use cases and Installation,Introduction to Hive, use cases and Installation,Introduction to Pig, use cases and Installation,Introduction to Oozie, use cases and Installation,Introduction to Flume, use cases and Installation,Introduction to Yarn

Mini Project – Importing Mysql Data using Sqoop and Querying it using Hive

Deep Dive in Map Reduce

How to develop Map Reduce Application, writing unit test,Best Practices for developing and writing, Debugging Map Reduce applications,Joining Data sets in Map Reduce

Hive

A. Introduction to Hive

What Is Hive?,Hive Schema and Data Storage,Comparing Hive to Traditional Databases,Hive vs. Pig,Hive Use Cases,Interacting with Hive

B. Relational Data Analysis with Hive

Hive Databases and Tables,Basic HiveQL Syntax,Data Types ,Joining Data Sets,Common Built-in Functions,Hands-On Exercise: Running Hive Queries on the Shell, Scripts, and Hue

C. Hive Data Management

Hive Data Formats,Creating Databases and Hive-Managed Tables,Loading Data into Hive,Altering Databases and Tables,Self-Managed Tables,Simplifying Queries with Views,Storing Query Results,Controlling Access to Data,Hands-On Exercise: Data Management with Hive

D. Hive Optimization

Understanding Query Performance,Partitioning,Bucketing,Indexing Data

E. Extending Hive

Topics : User-Defined Functions

F. Hands on Exercises – Playing with huge data and Querying extensively.

G. User defined Functions, Optimizing Queries, Tips and Tricks for performance tuning

Pig

A. Introduction to Pig

What Is Pig?,Pig’s Features,Pig Use Cases,Interacting with Pig

B. Basic Data Analysis with Pig

Pig Latin Syntax, Loading Data,Simple Data Types,Field Definitions,Data Output,Viewing the Schema,Filtering and Sorting Data,Commonly-Used Functions,Hands-On
Exercise: Using Pig for ETL Processing

C. Processing Complex Data with Pig

Complex/Nested Data Types,Grouping,Iterating Grouped Data,Hands-On Exercise: Analyzing Data with Pig

D.  Multi-Data set Operations with Pig

Techniques for Combining Data Sets,Joining Data Sets in Pig,Set Operations,Splitting Data Sets,Hands-On Exercise

E. Extending Pig

Macros and Imports,UDFs,Using Other Languages to Process Data with Pig,Hands-On Exercise: Extending Pig with Streaming and UDFs

F. Pig Jobs

Impala

A. Introduction to Impala

What is Impala?,How Impala Differs from Hive and Pig,How Impala Differs from Relational Databases,Limitations and Future Directions Using the Impala Shell

B. Choosing the best (Hive, Pig, Impala)

Major Project – Putting it all together and Connecting Dots

Putting it all together and Connecting Dots,Working with Large data sets, Steps involved in analyzing large data

ETL Connectivity with Hadoop Ecosystem

How ETL tools work in big data Industry,Connecting to HDFS from ETL tool and moving data from Local system to HDFS,Moving Data from DBMS to HDFS,Working with Hive with ETL Tool,Creating Map Reduce job in ETL tool,End to End ETL PoC showing Hadoop integration with ETL tool.

Job and Certification Support

Major Project, Hadoop Development, cloudera Certification Tips and Guidance and Mock Interview Preparation, Practical Development Tips and Techniques, certification preparation

What projects I will be working on this Talend For Hadoop training?

Project Work

1. Project – Jobs

Problem Statement – It describes that how to create a job using metadata. For this it includes following actions:

Create XML File,Create Delimited File,Create Excel File,Create Database Connection

2. Hadoop Projects

A. Project – Working with Map Reduce, Hive, Sqoop

Problem Statement – It describes that how to import mysql data using sqoop and querying it using hive and also describes that how to run the word count mapreduce job.

B. Project – Connecting Pentaho with Hadoop Eco-system

Problem Statement – It includes:

Quick Overview of ETL and BI,Configuring Pentaho to work with Hadoop Distribution,Loading data into Hadoop cluster,Transforming data into Hadoop cluster
Extracting data from Hadoop Cluster

Information Server

Introduction to the IBM Information Server Architecture, the Server Suite components, the various tiers in the Information Server.

InfoSphere DataStage

Understanding the IBM InfoSphere DataStage, the Job life cycle to develop, test, deploy and run data jobs, high performance parallel framework, real-time data integration.

DataStage Features

Introduction to the design elements, various DataStage jobs, creating massively parallel framework, scalable ETL features, working with DataStage jobs.

DataStage Job

Understanding the DataStage Job, creating a Job that can effectively extract, transform and load data, cleansing and formatting data to improve its quality.

Parallelism, Partitioning and Collecting

Learning about data parallelism – pipeline parallelism and partitioning parallelism, the two types of data partitioning – Key-based partitioning and Keyless partitioning, detailed understanding of partitioning techniques like round robin, entire, hash key, range, DB2 partitioning, data collecting techniques and types like round robin, order, sorted merge and same collecting methods.

Job Stages of InfoSphere DataStage

Understanding the various job stages – data source, transformer, final database, the various parallel stages – general objects, debug and development stages, processing stage, file stage types, database stage, real time stage, restructure stage, data quality and sequence stages of InfoSphere DataStage.

Stage Editor

Understanding the parallel job stage editors, the important types of stage editors in DataStage.

Sequential File

Working with the Sequential file stages, understanding runtime column propagation, working with RCP in sequential file stages, using the sequential file stage as a source stage and target stage.

Dataset and Fileset

Understanding the difference between dataset and fileset and how DataStage works in each scenario.

Sample Job Creation

Creating of a sample DataStage job using the dataset and fileset types of data.

Properties of Sequential File stage and Data Set Stage

Learning about the various properties of Sequential File Stage and Dataset stage.

Lookup File Set Stage

Creating a lookup file set, working in parallel or sequential stage, learning about single input and output link.

Transformer Stage

Studying the Transformer Stage in DataStage, the basic working of this stage, characteristics -single input, any number of outputs and reject link, how it differs from other processing stages, the significance of Transformer Editor, and evaluation sequence in this stage.

Transformer Stage Functions & Features

Deep dive into Transformer functions – String, type conversion, null handling, mathematical, utility functions, understanding the various features like constraint, system variables, conditional job aborting, Operators and Trigger Tab.

Looping Functionality

Understanding the looping functionality in Transformer Stage, output with multiple rows for single input row, the procedure for looping, loop variable properties.

Teradata Enterprise Stage

Connecting to the Teradata Enterprise Stage, properties of connection.

Single partition and parallel execution

Generating data using Row Generator sequentially in a single partition, configuring to run in parallel.

Aggregator Stage

Understanding the Aggregator Stage in DataStage, the two types of aggregation – hash mode and sort mode.

Different Stages Of Processing

Deep learning of the various stages in DataStage, the importance of Copy, Filter and Modify stages to reduce number of Transformer Stages.

Parameters and Value File

Understanding Parameter Set, storing DataStage and Quality Stage job parameters and default values in files, the procedure to deploy Parameter Sets function and its advantages.

What projects I will be working on this DataStage training?

Project 1 :  Making sense of financial data

Industry :  Financial Services

Problem Statement : Extract value from multiple sources & varieties of data in the financial domain

Description : In this project you will learn how to work with disparate data in the financial services domain and come up with valuable business insights. You will deploy IBM InfoSphere DataStage for the entire Extract, Transform, Load process to leverage it for a parallel framework either on-premise or on the cloud for high performance results. You will work on big data at rest and big data in motion as well.

Highlights :

  • Creating DataStage jobs for ETL process
  • Deploying DataStage Parallel Stage Editor
  • Data Partitioning for getting consistent results

Project 2 : Enterprise IT data management

Industry :  Information Technology

Problem Statement :  Software enterprises have a lot of data and this needs to made sense of in order to derive valuable insights from it

Description : This project involves working with the data warehouse existing in a company deploying the IBM DataStage onto it for the various processes of extract, transform, and load. You will learn how DataStage manages high performance parallel computing. You will learn how it implements extended metadata management and enterprise connectivity. This also includes combining heterogeneous data.

Highlights :

  • Enforce workload & business rules
  • DataStage deployed on heterogeneous data
  • Integrating real-time data at scale.

Project 3 : Medical drug discovery and development

Industry :  Pharmaceutical

Problem Statement :  A pharmaceutical company wants to speed the process of drug discovery and development through using ETL solutions.

Description :  This project deals with the domain of drug molecule discovery and development. You will learn how DataStage helps to make sense of the huge data warehouse that resides within the pharmaceutical domain which includes data about patient history, existing molecules, and the effect of the existing drugs and so on. The ETL tool DataStage will help to make the process of drug discovery that much easier.

Highlights :

  • Combining various types of data with ETL process
  • Converting the data and transferring it for analysis
  • Making the data ready for visualization & insights.

Project 4 :  Finding the oil reserves in ocean

Industry :  Oil and Gas

Problem Statement :  Finding new oil reserves is a very herculean task. There are huge amounts of data that need to be parsed in order to find where oil exists in the ocean. This is where there is a need for an ETL tool like DataStage.

Description :  This project deals with the process of deploying ETL tool like Datastage to parse petabytes of data for discovering new oil. This data could be in the form of geological data, sensor data, streaming data and so. You will learn how DataStage can make sense of all this data.

Highlights :

  • Working with cloud or on-premise data
  • Deploying DataStage for static or streaming data
  • Converting data into the right format for analysis
View More

Free Career Counselling

Certification

This training course is designed for clearing the following exams:

  • Power Center Developer Certification
  • Oracle Business Intelligence Foundation Essentials Exam
  • Talend Data Integration Certified Developer Exam
  • IBM Certified Solution Developer InfoSphere DataStage
  • Pentaho Business Analytics Implementation
  • Cloudera Spark and Hadoop Developer Certification (CCA175) Exam

The entire course content is in line with respective certification programs and helps you clear the requisite certification exams with ease and get the best jobs in top MNCs.

As part of this training, you will be working on real-time projects and assignments that have immense implications in the real-world industry scenarios, thus helping you fast-track your career effortlessly.

At the end of this training program, there will be quizzes that perfectly reflect the type of questions asked in the respective certification exams and help you score better marks.

Intellipaat Course Completion Certificate will be awarded on the completion of the project work (after the expert review) and upon scoring at least 60% marks in the quiz. Intellipaat certification is well recognized in top 80+ MNCs like Ericsson, Cisco, Cognizant, Sony, Mu Sigma, Saint-Gobain, Standard Chartered, TCS, Genpact, Hexaware, etc.

Our Alumni works at top 3000+ companies

client-desktop client-mobile

Course Advisor

Suresh Paritala

Suresh Paritala

Solutions Architect at Microsoft, USA

A Senior Software Architect at NextGen Healthcare who has previously worked with IBM Corporation, Suresh Paritala has worked on Big Data, Data Science, Advanced Analytics, Internet of Things and Azure, along with AI domains like Machine Learning and Deep Learning. He has successfully implemented high-impact.

David Callaghan

David Callaghan

Big Data Solutions Architect, USA

An experienced Blockchain Professional who has been bringing integrated Blockchain, particularly Hyperledger and Ethereum, and Big Data solutions to the cloud, David Callaghan has previously worked on Hadoop, AWS Cloud, Big Data and Pentaho projects that have had major impact on revenues of marquee brands around the world.

Samanth Reddy

Samanth Reddy

Data Team Lead at Sony, USA

A renowned Data Scientist who has worked with Google and is currently working at ASCAP, Samanth Reddy has a proven ability to develop Data Science strategies that have a high impact on the revenues of various organizations. He comes with strong Data Science expertise and has created decisive Data Science strategies for Fortune 500 corporations.

Frequently Asked Questions on ETL Tools

Why should I learn ETL Tools from Intellipaat?

Intellipaat is a market leader in the ETL tools training. Today ETL tools are increasingly used in business scenarios in order to efficiently derive insights from huge amounts of disparate data. The extract-transform-load process is pretty standard when it comes to getting data from diverse databases, cleansing, filtering, transforming and finally deploying the data into the destination database.

This training includes some of the most powerful and efficient ETL tools like Informatica, SSIS, OBIEE, Talend, DataStage and Pentaho. The entire course content of this combo training is created toward helping you clear multiple certifications exams, viz., Power Center Developer Certification, Oracle Business Intelligence Foundation Essentials Exam, Talend Data Integration Certified Developer Exam, IBM Certified Solution Developer – InfoSphere DataStage, Pentaho Business Analytics Implementation and Cloudera Spark and Hadoop Developer Certification (CCA175) Exam.

This is a completely career-oriented training designed by industry experts. Your training program includes real-time projects and step-by-step assignments to evaluate your progress and specifically designed quizzes for clearing the requisite certification exams.

Intellipaat also offers lifetime access to videos, course materials, 24/7 support and course material upgrades to the latest version at no extra fee. Hence, it is clearly a one-time investment.

At Intellipaat you can enroll either for the instructor-led online training or self-paced training. Apart from this Intellipaat also offers corporate training for organizations to upskill their workforce. All trainers at Intellipaat have 12+ years of relevant industry experience and they have been actively working as consultants in the same domain making them subject matter experts. Go through the sample videos to check the quality of the trainers.
Intellipaat is offering the 24/7 query resolution and you can raise a ticket with the dedicated support team anytime. You can avail the email support for all your queries. In the event of your query not getting resolved through email we can also arrange one-to-one sessions with the trainers. You would be glad to know that you can contact Intellipaat support even after completion of the training. We also do not put a limit on the number of tickets you can raise when it comes to query resolution and doubt clearance.
Intellipaat offers the self-paced training to those who want to learn at their own pace. This training also affords you the benefit of query resolution through email, one-on-one sessions with trainers, round the clock support and access to the learning modules or LMS for lifetime. Also you get the latest version of the course material at no added cost. The Intellipaat self-paced training is 75% lesser priced compared to the online instructor-led training. If you face any problems while learning we can always arrange a virtual live class with the trainers as well.
Intellipaat is offering you the most updated, relevant and high value real-world projects as part of the training program. This way you can implement the learning that you have acquired in a real-world industry setup. All training comes with multiple projects that thoroughly test your skills, learning and practical knowledge thus making you completely industry-ready. You will work on highly exciting projects in the domains of high technology, ecommerce, marketing, sales, networking, banking, insurance, etc. Upon successful completion of the projects your skills will be considered equal to six months of rigorous industry experience.
Intellipaat actively provides placement assistance to all learners who have successfully completed the training. For this we are exclusively tied-up with over 80 top MNCs from around the world. This way you can be placed in outstanding organizations like Sony, Ericsson, TCS, Mu Sigma, Standard Chartered, Cognizant, Cisco, among other equally great enterprises. We also help you with the job interview and résumé preparation part as well.
You can definitely make the switch from self-paced to online instructor-led training by simply paying the extra amount and joining the next batch of the training which shall be notified to you specifically.
Once you complete the Intellipaat training program along with all the real-world projects, quizzes and assignments and upon scoring at least 60% marks in the qualifying exam; you will be awarded the Intellipaat verified certification. This certificate is very well recognized in Intellipaat affiliate organizations which include over 80 top MNCs from around the world which are also part of the Fortune 500 list of companies.
Apparently, No. Our Job Assistance program is aimed at helping you land in your dream job. It offers a potential opportunity for you to explore various competitive openings in the corporate world and assists you in finding a well-paid job, matching your profile. The final decision on your hiring will always be based on your performance in the interview and the requirements of the recruiter.
View More

Talk to us

Select Currency