• Articles
  • Tutorials
  • Interview Questions

Introduction To Pentaho

What is Pentaho?

Pentaho Data Integration is an engine along with a suite of tools responsible for the processes of extracting, transforming, and loading—best known as the ETL processes.
Pentaho Data Integration and Pentaho BI Suite:
Before introducing PDI, let’s talk about Pentaho BI Suite. The Pentaho training from Intellipaat lets you master the Business Intelligence Suite that is a collection of software applications intended to create and deliver solutions for decision making.

The Main Functions of Pentaho are:

  • Analysis: This is the engine that is provided by the JPivot library and the Mondrain OLAP server so that you can navigate and explore seamlessly. This is a multidimensional analysis.
  • Reporting: With this you are able to take different sources of data and create, design and distribute insightful reports in any form that you like PDF, HTML and others. The JFreeReport is the one that creates the Pentaho reports. You can also take BIRT or Jasper Reports that are created using other reporting libraries and integrate them to create the report.
  • Data Mining: This is the process of looking at the data available in already existing databases and coming up with new insights from it. You can deploy various algorithms for analytics in data mining.
  • Dashboards: The dashboard is the tool which is used for creating and working with performance indicators. Thanks to the intuitive nature of dashboards you will create stellar graphs, reports, charts, and other data visualizations.
  • Data integration: The data integration is the process of integrating data from multiple sources and creating a consolidated value proposition out of it. This data could be in the form of databases, files, applications and so on.

It is possible to use each of these features individually or you can use it in a consolidated fashion. You can run reporting, analysis features and use the Pentaho tool as a consolidated Business Intelligence platform. The Pentaho engine offers some very important services like scheduling, authentication, web services and others.

Read these Latest Pentaho Interview Questions that helps you grab high-paying jobs!

Pentaho Data Integration:

The latest Pentaho engine follows the earlier version of the Pentaho version and all this was created as a community engine. The Pentaho data integration engine is a business intelligence tool that was created from the Pentaho Kettle.

Data cleansing:

The process of data cleansing means that you separate the data that is useful from the one which is of no use. You can do this by seeing if the data meets the predefined rules, look for patterns, trends, set approximate value for data that is missing, remove the information that is not present, normalize the data that does not fall within the ambit of the minimum and maximum values. So with the Pentaho Kettle all this is possible thanks to the large number of transformations and validations that are available.
Check this incisive article from SiliconAngle that talks about how Pentaho is turning the heat on Hadoop and Spark.

Installing PDI:

If you want to work with Pentaho Data Integration then you have to install the software. This is simple.
First you need to install the Pentaho Kettle regardless of your operating system the process is the same. The only need is that you need to have JRE 5.0 or higher to be installed.

  • From http://community.pentaho.com/sourceforge/ follow the link to Pentaho Data Integration (Kettle). Or you can go to the download page http://sourceforge.net/projects/pentaho/files/Data Integration.
  • Choose the newest stable release.
  • Within the folder of your choice, you have to unzip the downloaded file —C:/Kettle or /home/your_dir/kettle.
  • If you have a Windows system then it is fine. For UNIX environment you need to make the script executable. If you have chosen the installation folder as Kettle then you have to execute the following command:

cd Kettle
chmod +x *.sh
Launching the PDI graphical designer: Spoon:
Once you have installed the Pentaho Data Integration, you can start working with the data. You can do that within a graphical environment. Spoon is the Pentaho Data Integration designer tool.

Learn Pentaho

How to Start and customize Spoon:

1. Start Spoon.
If your system is Windows, type the following command:
Spoon.bat
If you have Unix or Linux then type the following command:
Spoon.sh
If spoon.sh is not executable, then type:
sh Spoon.sh
2. You will see a repository connection dialog box as soon as Spoon starts. This will ask for the repository connection data. You have to click on the No Repository button. You will get a tip of the day in a small window. You can close this window after you read it.
3. You will see a welcome window that has a few important links
4. You can close the window and then open it later from main menu
5. From the Edit menu you can click options wherein the window appears to change the visual characteristics
6. Select the tab window Look Feel.
7. Change the Grid size and Preferred Language settings:
8. Click the OK
9. Restart Spoon for changes to be applied. You won’t see the repository dialog or the welcome window.

How to store transformations and jobs in a repository:

You chose the No Repository initially when you launched the Spoon. Later the Spoon stops asking the repository option once you configure it. If you want to save the jobs and transformations, you have two options.

Repository

The jobs and transformations in the repository are saved when you use the repository. You can think of the repository as a relational database that is designed for this specific purpose.

Files

In this method you will save the jobs and transformations as normal XML files with the extensions.

Why do you choose to work with files instead of the repository?

  • It is practical to work with files for most of the users
  • You need to have some database knowledge for working with repositories

How to install MySQL on Windows:

In order to install MySQL on your Windows system, please follow these instructions:
1. First choose your internet browser and type http://dev.mysql.com/downloads/mysql/.
2. Choose the Microsoft Windows platform and download the mysql-essential package as per the needs of the system: 32-bit or 64-bit.
3. On the downloaded file you have to click twice. You will be guided through a wizard.
4. You have to select Typical when you are asked for the setup type
5. After many screens you can choose to configure the server when the wizard is completed.
6. You will get a new wizard to help you configure your server
7. Choose the Standard Configuration.
8. For the root user you have to provide a password for the security options.
9. For proceeding with the configuration you can click on execute.
10. Once the MySQL is installed it would be good to install the GUI tools so you can monitor and query the database.
11. One a browser and type http://dev.mysql.com/downloads/gui-tools/.
12. Check the Windows downloads and get the Windows (x86) package downloaded
13. A wizard will guide you through the process once you have double-clicked
14. Choose complete when you are asked for the setup type
15. Follow the wizard instructions.
16. Now you will have the GUI tools in MySQL menu.

Course Schedule

Name Date Details
Pentaho Certification Training 16 Nov 2024(Sat-Sun) Weekend Batch View Details
23 Nov 2024(Sat-Sun) Weekend Batch
30 Nov 2024(Sat-Sun) Weekend Batch

About the Author

Data Analyst & Machine Learning Associate

As a Data Analyst and machine learning associate, Nishtha combines her analytical skills and machine learning knowledge to interpret complicated datasets. She is also a passionate storyteller who transforms crucial findings into gripping tales that further influence data-driven decision-making in the business frontier.