Once you collect all the required data, you will realize that the data is diverse and mixed up. Everything is tracked differently across different files. There are even separate files that are from a specific category that is all the data for just a single year. Sorting through all of that can seem extremely confusing and difficult but Tableau Prep can help take care of this. You will learn about everything there is to know about Tableau Prep in this tutorial.
Check out this Tableau Training for the Beginners video tutorial:
What is Tableau Prep?
Tableau Prep helps with the cleaning, shaping, and organization of data before it can be analyzed. It can gather a considerable amount of data from varied sources and transform them. Its simple drag-and-drop features quickly streamline the complicated tasks of pivots, unions, joins, aggregate, etc. Once the data is clean, it can be subsequently used in the Tableau Prep output as the data source for Tableau Desktop for analysis.
There are sample datasets included in Tableau Prep like the Sample Superstore that you can use to create a flow for Please note that the latest version of Tableau Prep Builder will be used here. Results may vary for previous versions.
Tableau Prep Builder Download and Installation
To download and install Tableau Prep Builder, you will need a Creator product key and the installer. Go to the Customer Portal for the most latest and updated version of Tableau Prep Builder.
The installer can be downloaded from the Product Downloads section according to your operating system. You will get the installers from the Product Downloads and Release Notes page. There is also a free trial version available.
Please note that Tableau Prep Builder is specifically built to work with Tableau Desktop. So, make sure to install it on the same system that runs Tableau Desktop and not the one that is running Tableau Server. Tableau Server Resource Manager (SRM) cannot differentiate between the Tableau Prep protocol server process and the Tableau Server protocol server process. SRM may end the protocol server process of Tableau Prep Builder in case the computer resources are exhausted, and this does not have a recovery mechanism.
Tableau Prep Builder automatically creates a My Tableau <application> Repository folder structure in the documents folder. If you want to move this repository to another location, you can specify the desired location and point Tableau Prep Builder to the new folder. However, changing the location does not move the files in the original repository.
Learn all about Tableau in the Tableau Certification offered by Intellipaat.
Tableau Prep Tutorial
For this Tableau Prep Builder tutorial, you will use the Superstore data, which can be downloaded from here.
When you open Tableau Prep, you will see the following screen with the Connections pane:
First, you will have to connect to a data source. This will create an input step. You can choose to connect to over 25 data sources with Tableau Data Prep. Go to Connections, select Microsoft Excel, and select the downloaded Superstore data file. Once the file is imported, you will see the data as shown below:
There are various sheets on the left-hand side in the source file (or Excel file, in this case). You have to drag Orders from the left into the white area on the right, and you will see a list as shown below:
You can select any field/fields you want from here, but for the purpose of this tutorial, you will select Category, Order Date, Customer ID, Product ID, and Sales.
Once your required fields are selected, you will aim to find a customer’s first purchase date, the sales, categories, and the total distinct Product IDs. You will see different operations to choose from when you expand Order at the top window. Then, you will select Aggregate as highlighted in the image below:
Now, you will drag and drop the Customer ID to Grouped Fields on the right and the Order Date to Aggregated Fields.
Now, you need to choose the Aggregation level for the Order Date as Minimum (see below).
Similarly, you will add a new Aggregate (Aggregate 2) from the source orders. This time, Order Date and Customer ID should be brought into the Grouped Fields and Product ID, Sales, and Category into the Aggregated Fields.
Learn about Tableau Desktop in our Tableau Tutorial for Beginners!
You need to make sure to choose the Aggregation level for Sales as Sum and for Product ID and Category as Count Distinct.
Now, let’s learn how to apply the Join operation. In this case, joining both the aggregated results will give you the first Purchase Date, Count of Products, Sales, and Categories. So, you will select Join from Aggregate 2 as shown below:
Now, you need to bring Aggregate 1 to Join 1, and you will see the following changes:
The next step will be to add the Join condition on Customer ID and Order Date.
Now, you have all the fields required as well as the duplicate ones for Customer ID and Order Date. You can do away with the duplicates in the cleaning step.
For clarity, it is better to rearrange the data and rename the fields—Order Date as 1st Purchase Date, Product ID as 1st Purchase Products, Sales as 1st Purchase Sales, and Category as 1st Purchase Categories.
This gives you Customer IDs with their first date of purchase, sales, and the total distinct products and categories.
Next, you will apply another Join operation on Aggregate 1 and Aggregate 2, but this time, with slightly different criteria. It will have a matching Customer ID, but the Order Dates will not be the same.
Again, you will add Clean Step and remove the duplicates, Customer ID-1 and Order Date-1. Your screen will look something like this now:
Next, you will apply a new Aggregate (Aggregate 3) to Clean 2 to find the minimum Order Date as the first Order Date was removed when Join 2 was applied. You need to drag Customer ID to Grouped Fields and Order Date to Aggregated Fields and change the Order Date level to a minimum.
You will create Join 3 after Aggregate 3 and join Aggregate 2 with it on Order Date and Customer ID as given in the image below:
Now, you have the output as well as the duplicate fields, Customer ID-1 and Order Date-1. You will remove these fields after applying the Clean Step operation (Clean 3).
You will now rearrange and rename the fields, like before, for clarity—Order Date as 2nd Purchase Date, Product ID as 2nd Purchase Products, Sales as 2nd Purchase Sales, and Category as 2nd Purchase Categories.
Now, you have two datasets (outputs from Clean 1 and Clean 3) ready at hand. Now, you will combine them using Join on Customer ID as shown in the below images:
You will see a duplicate Customer ID in the Join Results panel. You will remove the duplicate entry by applying Clean Step and then rearrange the data for proper arrangement.
Your final screen will look like this:
You can export the data into either a .csv, .hyper, or .tde file using the Output operation.
Then, you need to click on Run Flow by which you will get a confirmation like below:
Now, this saved file is ready to be used in Tableau for data visualization and further analysis.
You can drop all your Tableau-related questions in our Tableau Community.