• Articles
  • Tutorials
  • Interview Questions

Working with Databases

Steps to create database connection in Pentaho

Creating a connection with the Steel Wheels database:

  • Go to the Pentaho Download site: http://sourceforge.net/projects/pentaho/files/.
  • Under the Business Intelligence Server look for the file named pentaho_sample_data-1.7.1.zip and download it.
  • Unzip the downloaded file.
  • Run start_hypersonic.bat under Windows or start_hypersonic.sh under Unix-based operating systems. If you download the sample data, you will find these scripts in the folder named pentaho-data. If you download the Pentaho BI server instead, you will find them in the folder named data. The following screen is displayed when the database server starts:

1

  • Open Spoon and create a new transformation.
  • Click on the View option that appears in the upper-left corner of the screen.
  • Right-click the Database connections option and click on New.

1
8. Fill the Database Connection dialog window.
1

  • Click on the Test button. The following window shows up:1
  • Click on OK to close the test window.
  • Click on OK again to close the database definition window. A new database connection is added to the tree.

12. Right-click on the database connection and click on Share. The connection is available in all transformations you create from now onwards.
1

  • Save the transformation.

Exploring the sample database:

  • Open the transformation you just created.
  • Right-click the connection in the Database connections list and select Explore in the contextual menu. The Database explorer on connection window opens.
  • Expand the Tables node of the tree and select CUSTOMERS. This is how the explorer looks:

1

  • Click on the Open SQL for [CUSTOMERS] option.
  • The following SQL editor window appears:

1

  • Modify the text in the window so that you have the following:
SELECT
CUSTOMERNUMBER
, CUSTOMERNAME
, CITY
, COUNTRY
FROM CUSTOMERS
  • Click on Execute. You will see the following result:

1

  • Close the preview window (the window that tells the result of the execution) and the SQL editor window.
  • Click on OK to close the database explorer window.

A brief word about SQL:

SQL, that is, Structured Query Language is the language that lets you access and manipulate databases in a RDBMS.
SQL can be divided into two parts—DDL and DML.
The DDL, that is, Data Definition Language is the branch of the language that basically allows creating or deleting databases and tables.
DML, that is, Data Manipulation Language allows you to retrieve data from a database. It also lets you insert, update, or delete data from the database.
The statement you typed in the SQL editor is an example of DML:

SELECT
CUSTOMERNUMBER
, CUSTOMERNAME
, CITY
, COUNTRY
FROM CUSTOMERS

Learn Pentaho

Getting orders in a range of dates by using parameters:

  • Open the transformation from the previous tutorial and save it under a new name.
  • From the Input category, add a Get System Info step.
  • Double-click it and use the step to get the command line argument 1 and command line argument 2 values. Name the fields as date_from and date_to respectively. Create a hop from the Get System Info step to the Table input step.
  • Double-click the Table input step.
  • Modify the SELECT statement as follows:
SELECT
ORDERNUMBER
, ORDERDATE
, REQUIREDDATE
, SHIPPEDDATE
FROM ORDERS
WHERE STATUS = 'Shipped'
AND ORDERDATE BETWEEN ? AND ?
  • In the drop-down list to the right side of Insert data from step, select the incoming step.
  • Click OK.
  • With the Select values step selected, click the Preview button
  • Click on Configure.
  • Fill the Arguments grid. To the right of the argument 01, type 2004-12-01. To the right of the argument 02, type 2004-12-10.
  • Click OK. The following window appears:

1

Getting orders in a range of dates by using variables:

  • Open the main transformation we created in the Time for action–getting data about shipped orders section and save it under a new name.
  • Double-click the Table input step.
  • Modify the SELECT statement as follows:
SELECT
ORDERNUMBER
, ORDERDATE
, REQUIREDDATE
, SHIPPEDDATE
FROM ORDERS
WHERE STATUS = 'Shipped'
ND ORDERDATE BETWEEN '${DATE_FROM}' AND '${DATE_TO}'
  • Tick the Replace variables in script? checkbox.
  • Save the transformation.
  • With the Select values step selected, click the Preview button.
  • Click on Configure.
  • Fill the Variables grid in the settings dialog window—type 2004-12-01 to the right of the DATE_FROM option and 2004-12-10 to the right of the DATE_TO option.
  • Click OK. This following window appears:

1

Deleting data about discontinued items:

  • From the Packt website, download the LUX_discontinued.txt file.
  • Create a new transformation.
  • With a Text file input step, read the file.
  • Preview the file.

1

  • After the Text file input step, add an Add constants step to add a String constant named man_code with value LUX.
  • Expand the Output category of steps and drag a Delete step to the canvas.
  • Create a hop from the Add constants step to the Delete step.
  • Double-click the Delete step. Select js as Connection and, as Target table, browse and select products. In the grid add the conditions man_code = man_code and pro_theme LIKE category. After the Delete step, add a Write to log step.
  • Right-click the Delete step and define the error handling.
  • Save the transformation.
  • Before running the transformation, open the Database Explorer.
  • Under the js connection, locate the products table and click Open SQL for products.
  • In the simple SQL editor type:

SELECT pro_theme, pro_name FROM js.products p
ORDER BY pro_theme, pro_name;

  • Click on Execute.

1

  • Close the preview data window and the results of the SQL window.
  • Minimize the database explorer window.
  • The database explorer is collapsed at the bottom of the Spoon window.
  • Run the transformation.
  • Look at the Step Metrics. The following is what you should see:

1

  • Maximize the database explorer window.
  • In the SQL editor window click Execute again.

1

About the Author

Data Analyst & Machine Learning Associate

As a Data Analyst & Machine Learning Associate, Nishtha uses a combination of her analytical skills and machine learning knowledge to interpret complicated datasets. She is a passionate storyteller who transforms crucial findings into gripping tales that further influence data-driven decision-making in the business frontier.