Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in BI by (47.2k points)

I have been creating a .csv file using pandas to connect to Tableau. I would like to do it purely in Tableau if possible so I can outsource the daily updates to a coworker without having to teach him python/pandas.

File A: 1.2 mil rows per month, each row represents an interaction with a customer and it has the following columns which I merge together for a unique value (which is not perfect because sometimes there are multiple rows with different timestamps for the same interaction - I typically drop duplicates on these three fields combined (countd(Date+PersonID+CustomerID)):

Date | PersonID | CustomerID 

File B: 300k rows per month. This represents if the PersonID while interacting with the CustomerID has an issue, they reach out to another person. I also create a field of (Date+PersonID+CustomerID) to join with File A.

Date | PersonID | CustomerID | HelperID 

File C: 200k rows per month. This is a subset of File C with some different information but it does not have PersonID. I normally do a (Date+CustomerID) and then look up who the PersonID from File A was and that information.

Date | CustomerID | HelperID

In the end, I have an overall file (outer joins) then a smaller file where there are direct matches about the same interaction where I can say this PersonID interacted with this CustomerID and needed help from this HelperID on this date about a certain issue.

How do I approach this in Tableau directly? Should I load all three files separately and Blend on the CustomerID? Should I create the unique concatenated fields on each file and then do joins? It's not perfect because there is no index or key which directly links these files 1 to 1. I am afraid of doing a left join and inflating numbers (I notice there are duplicates sometimes and I don't know how to address it when I do joins in Tableau) 

1 Answer

0 votes
by (17.6k points)
edited by

  • Joins are more flexible and performant than blends, so if you have a choice, prefer joins.

  • To be able to use joins (or custom SQL), the data needs to be either in (potentially multiple) tabs in the same Excel spreadsheet or tables in a database. Either can be loaded from CSV.

  • The principle use case for blends is when you need to combine information drawn from different data sources -- such as comparing data in an Oracle database with other data in a spreadsheet.

  • Prior to Tableau 9, blends were also one of the few approaches for combining information drawn from the same database, but using queries that returned information at different levels of detail -- such as comparing metrics for individual store locations with metrics for the enclosing regions.

  • level of detail (LOD) calculations are introduced in Tableau 9 which handle that use case better than data blending.

Interested to learn about Tableau from top experts? Enroll in Tableau Course to get expert guidance.

Browse Categories

...