Flat 10% & upto 40% off + 10% Cashback + Free additional Courses. Hurry up

Introduction To Pentaho


Pentaho Data Integration is an engine along with a suite of tools responsible for the processes of extracting, transforming, and loading—best known as the ETL processes.

Pentaho Data Integration and Pentaho BI Suite:

Before introducing PDI, let’s talk about Pentaho BI Suite. The Pentaho training from Intellipaat lets you master the Business Intelligence Suite that is a collection of software applications intended to create and deliver solutions for decision making. The main functional areas covered by the suite are:

  • Analysis: The analysis engine serves multidimensional analysis. It’s provided by the Mondrian OLAP server and the JPivot library for navigation and exploring.
  • Reporting: The reporting engine allows designing, creating, and distributing reports in various known formats (HTML, PDF, and so on) from different kinds of sources. The reports created in Pentaho are based mainly in the JFreeReport library, but it’s possible to integrate reports created with external reporting libraries such as Jasper Reports or BIRT.
  • Data Mining: Data mining is running data through algorithms in order to understand the business and do predictive analysis. Data mining is possible thanks to the Weka Project.
  • Dashboards: Dashboards are used to monitor and analyze Key Performance Indicators (KPIs). A set of tools incorporated to the BI Suite in the latest version allows users to create interesting dashboards, including graphs, reports, analysis views, and other Pentaho content, without much effort.


  • Data integration: Data integration is used to integrate scattered information from different sources (applications, databases, files) and make the integrated information available to the final user. Pentaho Data Integration—our main concern—is the engine that provides this functionality.

All this functionality can be used standalone as well as integrated. In order to run analysis, reports, and so on integrated as a suite, you have to use the Pentaho BI Platform. The platform has a solution engine, and offers critical services such as authentication, scheduling, security, and web services.

Read these latest Pentaho interview questions that helps you grab high-paying jobs!

Pentaho Data Integration:

Most of the Pentaho engines, including the engines mentioned earlier, were created as community projects and later adopted by Pentaho. The PDI engine is no exception—Pentaho Data Integration is the new denomination for the business intelligence tool born as Kettle.

Data cleansing:

Data cleansing is about ensuring that the data is correct and precise. This can be ensured by verifying if the data meets certain rules, discarding or correcting those that don’t follow the expected pattern, setting default values for missing data, eliminating information that is duplicated, normalizing data to conform minimum and maximum values, and so on—tasks that Kettle makes possible, thanks to its vast set of transformation and validation capabilities.

Check this incisive article from SiliconAngle that talks about how Pentaho is turning the heat on Hadoop and Spark.

Installing PDI:

In order to work with PDI you need to install the software. It’s a simple task; let’s do it.

These are the instructions to install Kettle, whatever your operating system.

The only prerequisite to install PDI is to have JRE 5.0 or higher installed.

  1. From follow the link to Pentaho Data Integration (Kettle). Alternatively, go directly to the download page Integration.
  2. Choose the newest stable release. At this time, it is 3.2.0.


3. Download the file that matches your platform. The preceding screenshot should help you.

  1. Unzip the downloaded file in a folder of your choice —C:/Kettle or /home/your_dir/kettle.
  2. If your system is Windows, you’re done. Under UNIX-like environments, it’s recommended that you make the scripts executable. Assuming that you chose Kettle as the installation folder, execute the following command:

            cd Kettle

           chmod +x *.sh

Launching the PDI graphical designer: Spoon:

Now that you’ve installed PDI, you must be eager to do some stuff with data. That will be possible only inside a graphical environment. PDI has a desktop designer tool named Spoon.

Starting and customizing Spoon:

  1. Start Spoon.

If your system is Windows, type the following command:


In other platforms such as Unix, Linux, and so on, type: 

If you didn’t make executable, you may type:

  1. As soon as Spoon starts, a dialog window appears asking for the repository connection data. Click the No Repository button. The main window appears. You will see a small window with the tip of the day. After reading it, close that window.
  2. A welcome! window appears with some useful links for you to see.
  3. Close the welcome window. You can open that window later from the main menu.
  4. Click Options… from the Edit menu. A window appears where you can change various general and visual characteristics. Uncheck the circled checkboxes:


  1. Select the tab window Look Feel.
  2. Change the Grid size and Preferred Language settings as follows:


  1. Click the OK button.
  2. Restart Spoon in order to apply the changes. You should neither see the repository dialog, nor the welcome window. You should see the following screen instead:


Storing transformations and jobs in a repository:

The first time you launched Spoon, you chose No Repository. After that, you configured Spoon to stop asking you for the Repository option. You must be curious about what the repository is and why not to use it.

As said, the results of working with PDI are Transformations and Jobs. In order to save the Transformations and Jobs, PDI offers two methods:

  • Repository: When you use the repository method you save jobs and transformations in a repository. A repository is a relational database specially designed for this purpose.
  • Files: The files method consists of saving jobs and transformations as regular XML files in the filesystem, with extension kjb and ktr respectively.

Why did we choose not to work with repository, or in other words, to work with files? This is mainly for the following two reasons:

  • Working with files is more natural and practical for most users.
  • Working with repository requires minimum database knowledge and that you also have access to a database engine from your computer. Having both preconditions would allow you to learn working with both methods. However, it’s probable that you haven’t.

Installing MySQL on Windows:

In order to install MySQL on your Windows system, please follow these instructions:

  1. Open an internet browser and type
  2. Select the Microsoft Windows platform and download the mysql-essential package that matches your system: 32-bit or 64-bit.
  3. Double-click the downloaded file. A wizard will guide you through the process.
  4. When asked about the setup type, select Typical.
  5. Several screens follow. When the wizard is complete you’ll have the option to configure the server. Check Configure the MySQL Server now and click Finish.
  6. A new wizard will be launched that lets you configure the server.
  7. When asked about the configuration type, select Standard Configuration.
  8. When prompted, set the Windows options as shown in the next screenshot:


  1. When prompted for the security options, provide a password for the root user.You’ll have to retype the password.
  1. In the next window click on Execute to proceed with the configuration. When the configuration is done, you’ll see this:


  1. Click on Finish. After installing MySQL it is recommended that you install the GUI tools for administering and querying the database.
  2. Open an Internet browser and type
  3. Look for the Windows downloads and download the Windows (x86) package.
  4. Double-click the downloaded file. A wizard will guide you through the process.
  5. When asked about the setup type, select Complete.
  6. Several screens follow. Just follow the wizard instructions.
  7. When the wizard ends, you’ll have the GUI tools added to the MySQL menu.


"0 Responses on Introduction To Pentaho"

100% Secure Payments. All major credit & debit cards accepted Or Pay by Paypal.

Sales Offer

  • To avail this offer, enroll before 27th April 2018.
  • This offer cannot be combined with any other offer.
  • This offer is valid on selected courses only.
  • Please use coupon codes mentioned below to avail the offer

Sign Up or Login to view the Free Introduction To Pentaho.