• Articles
  • Tutorials
  • Interview Questions

Transforming the Row Set

Tutorial Playlist

In this section of the Pentaho tutorial you will transform the row set, you will work on a tabular dataset, add the dataset, filter the rows, select values, sort rows, deploy row denormalizer, row normalizer and more.

Row Set Transformation

Enhancing a films file by converting rows to columns:

  • Download the file from the Packt website.
  • Create a transformation and read the file with a Text file input step.
  • In the Content tab of the Text file input step put : as separator. Also uncheck the Header and the No empty rows options.
  • In the Fields tab enter two string fields—feature and description. Do a preview of the input file to see if it is well configured. You should see two columns—feature with the texts to the left of the semicolons, and description with the text to the right of the semicolons.
  • Add a JavaScript step and type the following code that will create the film field:
var film;
if (getProcessCount('r') == 1) film = '';
if (feature == null)
film = '';
else if (film == '')
film = feature;

Learn Pentaho

  1. Click on the Get variables button to add to the dataset the field film.
  2. Add a Filter rows step with the condition description IS NOT NULL.
  3. With the Filter rows step selected, do a preview.

1

  1. After the filter step, add a Row denormalizer step. You can find it under the Transform category.
  2. Double-click the step and fill it like here:

1

  1. From the Utility category select an If field value is null step.
  2. Double-click it , check the Select fields option, and fill the Fields grid as follows:

1

  1. With this last step selected, do a preview. You will see this:

1

Enrich your knowledge on Pentaho Data Integration Transformation and various other functions by reading this insightful tutorial!

Calculating total scores by performances by country:

  1. Create a new transformation.
  2. Read the file with a Text file input step and do a preview to see that the step is well configured. You should see this:

1

  1. With a Select values step, keep only the following columns: Country, Performance, and totalScore.
  2. With a Sort Rows step sort the data by Country ascendant.
  3. After the Sort Rows step, put a Row denormalizer step.
  4. Double-click this last step to configure it.
  5. As the key field put Performance, and as group fields put Country.
  6. Fill the target fields’ grid like shown:

1

  1. Close the window.
  2. With the Row denormalizer step selected, do a preview. You will see this:

1

Enhancing the matches file by normalizing the dataset:

  1. Search on your disk for the file you created in Chapter 2, or download it from the Packt website.
  2. Create a new transformation and read the file by using a Text file input step.
  3. With a Split Fields step, split the Result field in two: home_t_goals and away_t_goals. (Do you remember having done this in chapter 3?)
  4. From the Transform category of steps, drag a Row Normalizer step to the canvas.
  5. Create a hop from the last step to this new one.
  1. Double-click the Row Normalizer step to edit it and fill the window as follows:

1

  1. With the Row Normalizer selected, do a preview. You should see this:

1

Creating the time dimension dataset:

  1. Create a new transformation.
  2. Press Ctrl+T to access the Transformation settings window.
  3. Select the Parameters tab and fill it like shown in the next screenshot

1

  1. Expand the Job category of steps.
  2. Drag a Get Variables step to the canvas, double-click the step, and fill the window like here:

1

  1. After the Get Variables step, add a Split Fields step and use it to split the field week_days into seven String fields named sun, mon, tue, wed, thu, fri, and sat. As Delimiter, set a comma (,).
  2. Add one more Split Fields step and use it to split the field week_days_short into seven String fields named sun_sh, mon_sh, tue_sh, wed_sh, thu_sh, fri_sh, and sat_sh. As Delimiter, set a comma (,).
  3. After this last step, add a Row Normalizer step.
  4. Double-click the Row Normalizer step and fill it as follows:

1

  1. Keep the Row Normalizer step selected and do a preview. You will see this:

1

Getting variables for setting the default starting date:

  1. Press Ctrl+T to open the transformation settings window.
  2. Add a parameter named START_DATE with default value 01/12/1999.
  3. Add a Get variables step between the Calculator step and the Filter rows step .
  4. Edit the Get variables step and a new field named start_date. Under Variable write ${START_DATE}. As Type select Date, and under Format select or type dd/MM/yyyy.
  5. Modify the filter step so the condition is now: date>=start_date and date<=31/12/2020.
  6. Modify the Select values step to remove the start_date field.
  7. With the Select values step selected do a preview. You will see this:

1

Course Schedule

Name Date Details
Pentaho Certification Training 20 Apr 2024(Sat-Sun) Weekend Batch
View Details
Pentaho Certification Training 27 Apr 2024(Sat-Sun) Weekend Batch
View Details
Pentaho Certification Training 04 May 2024(Sat-Sun) Weekend Batch
View Details

Data-Analytics-for-Business-Banner.jpg