This tutorial blog covers the following :
Tableau data blending
- On a single worksheet when data is Tableau blended from multiple data sources and is attached on common dimensions that is Data blending in Tableau.
- It does not create row-level joins. And this is not a way to add novel dimensions or rows to your data.
- It should be utilized when you have related data in many data sources that you want to examine together in a single view.
- To combine the data blending tableau, you should first add one of the common dimensions from the primary data source to the view.
Learn more about the performance of the Tableau server in this insightful blog now!
Tableau data blending on a Worksheet
Follow these steps to use data from both data sources on a single worksheet in tableau blending.
- Connect to Sample – Superstore Sales (Excel) and build a view that shows Sales by Customer Segment and Product Category.
- Select Data > Connect to Data and connect to the Sales Plan spreadsheet.
- Drag the Sales Plan measure to the Level of Detail shelf.
- Right-click the Sales axis and select Add Reference Line.
- In the Reference Line dialog box, add a reference line that shows Sales Plan per cell. When finished, click OK.
- The Worksheet is now pulling data from the secondary data source (Sales Plan) to show how actual sales compared to the forecasted sales.
Get in touch with Intellipaat for comprehensive Tableau training and be a certified Tableau Professional!
Steps for Blending Data
Data blending is typically used in situations where related data is present in multiple data sources. Now, we will discuss below the various steps involved in Tableau data blending.
Step 1: Data preparation for Blending
The first thing that needs to be ensured is that the workbook has multiple data sources. After this, go to the Menu—>Data—>New Data source. Now, drag a field to the View On-screen and note that the data source from where you dragged the field will become the primary data source. Hence, with this step, we complete adding the primary data source.
Step 2: Adding the secondary data source
To add the secondary data source, it should be ensured that the secondary data source has a blended relationship with the primary data source. Now, repeat the above steps, i.e. go to Menu—>Data—>New Data Source. To figure out whether the data sources are automatically linked, look for the orange linking field icon (). To link the data sources, click on the grey broken link icon ().
Step 3: Blending the Data
Now, you can easily integrate the data from both the sources- Primary and Secondary data sources based on a common dimension. Search for a small link image next to the common dimension.
Primary and Secondary Data sources
Data blending is based on two key components- Primary data sources and Secondary data sources. Whichever data source is used first in the view, becomes the primary data source and the second one to be used in the view is the secondary data source. There is a possibility that the values can be restricted from the secondary data. This happens because only the corresponding data to the primary source appears in the view.
“View Data” with a Data blend
From the tooltip, go to the “View Data” symbol and click on it. Now, the summary data that is displayed, contains all the fields in the view, including those of the secondary data sources. Now, go to “Full Data” and you will notice that the data from the secondary source is no longer included.
How to work across blended data sources?
Few points need to be taken into account while working across blended data sources. The two main points have been discussed below:
- Aggregation- There are aggregate and non-aggregate arguments in blended data sources. Whichever field used from another data source includes a default aggregation- SUM. However, this can be changed. But since the calculations cannot mix aggregate and non-aggregate arguments, the fields from the data sources must be manually aggregated.
- Dot Notation- using Dot notation, any field which is referenced in the calculation belonging to another data source will refer to its data source.
Define Blend Relationships
Typically, a common dimension (or dimensions) must exist between the multiple data sources. This is required for Tableau to understand how to combine the data from multiple sources and is known as the ‘linking field’. Also, active linking fields are identified with the active link icon (), and the fields that have the potential to be linked are identified with the broken link icon ().
Linking field in the Primary and the Secondary source while blending the data
There are two situations that need to be taken into consideration while linking fields in the primary and secondary data source during data blending. They are discussed as follows:
- In the presence of a common dimension- Tableau automatically establishes a relationship where the primary data source and secondary data source linking fields have the same name. Post the establishment of the primary data source (i.e. when a field is already being used in the view) and the secondary data source, any of the fields existing between the two sources will display either of the link icons – ( or ). Now, if the related field from the primary data source is used in the view, the link is automatically activated.
- In the absence of a common dimension- In cases where there is no common dimension, it is required to rename one of them and make the dimensions the same. On renaming, Tableau identifies it as a common dimension and establishes the link.
What is the difference between data joins and data blending?
Technically speaking, both these methods/techniques enable the user to combine data from multiple sources. But they are slightly different from each other. Below discussed are some of the differences:
- Data Blending can allow combining data even from multiple data sources to be linked. On the other hand, data joins can only work with data from the same source.
- Due to the huge amount of data sets involved, using data blending is the only choice available to join tables. On the other hand, data joins are not a suitable option in this case as they can create duplicate data, which might further increase the impracticalities.
- Another major point of distinction between the two techniques is the range of operation. Data blending occurs within Tableau whereas, data joins happen even outside Tableau.
Why should you blend instead of join in Tableau?
Data blending is the most preferred method in Tableau as compared to joins. Due to the presence of two joins – left and right, in the case, of data joining, the complexities involved are huge. In data blending, no such complexities arise as just the presence of a common dimension will suffice. If more and more tables are added, joins can prove to be complex as data duplication might arise. On the other hand, no such issues arise, in the case, of data blending. With data blending, tables from different data sources can usually be joined.
Data Blending at first sight
- Data blending typically involves pulling data from different sources.
- It is used for ad hoc reporting and rapid analysis.
- Before starting the data blending process, it is important to pull out the data from its source and bring it to a centralized warehouse.
- Data blending is carried out on a sheet-by-sheet basis.
- The primary data source can easily be figured out based on the order in which the fields are used.
- Liking fields can automatically occur through shared field names. However, they can be manually linked in case the field names differ.
- The asterisks (*) indicate multiple dimension values in a single mark.
- The secondary data source operates absolutely independently of the primary data source.
Limitations of Data Blending
- Performing data blending involves working with non-additive aggregates like MEDIAN, RAWSQLAGG, and COUNTD.It is difficult to work with these aggregates as the non-additive aggregates produce results that cannot be aggregated along a dimension.
- If you wish to publish the blended data sources as one single unit, it will not be possible as data blending does not allow that. Each data source needs to be published individually.
- While performing calculations in secondary data sources, the data from such sources must always be aggregated.
- Cube Data Sources must always be the primary data source.