Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in BI by (17.6k points)

I use Tableau and have a table with 140 fields. Due to the size/width of the table, the performance is poor. I would like to remove fields to increase reading speed, but my user base is so large, that at least one person uses each of the fields, while 90% use the same ~20 fields.

What is the best solution to this issue? (Tableau is our BI tool, BigQuery is our database)

What I have done thus far: In Tableau, it isn't clear how to use dynamic data sources that change based on the field selected. Ideally, I would like to have smaller views OR denormalized tables. As the users make their selections in Tableau, the underlying data sources updates to the table or view with that field.

I have tried a simple version of a large view, but that performed worse than my large table, and read significantly more data (remember, I am BigQuery, so I care very much about bytes read due to costs)

1 Answer

0 votes
by (47.2k points)

There are two kinds of approaches to do this

Approach 1: Extract your data.

Especially when it comes to data sources that are pay per query byte, (Big Query, Athena, Etc) extracts make a great deal of sense. It depends on how 'fresh' the data must be for the users. (Of course, all users will say 'live is the only way to go', but dig into this a little and see what it might actually be.) Refreshes can be scheduled for as little as 15 minutes. The real power of refreshes comes in the form of 'incremental refreshes' whereby only new records are added (along an index of int or date.) This is a great way to reduce costs - if your BigQuery database is partitioned - (which it should be.) Since Tableau Extracts are contained within .hyper files, a structure of Tableau's own design/control, they are extremely fast and optimized perfectly for use in Tableau.

Approach 2: Create 3 Data Sources (or more.)

 Certify these data sources after validating that they provide correct information. Provide users with clear descriptions.

  1. Original Large Dataset.

  2. The subset of ~20 fields for 90%.

  3. The remainder of fields for the 10%

  4. Extract of 1

  5. Extract of 2

  6. Extract of 3

Importantly, if field names match in each data source (ie: not changed manually ever) then it should be easy for a user to 'scale up' to larger datasets as needed. This means that they could generally always start out with a small subset of data to begin their exploration, and then use the 'replace datasource' feature to switch to a different data source while keeping their same views. (This wouldn't work as well if at all for scaling down, though.)

Related questions

31k questions

32.8k answers

501 comments

693 users

Browse Categories

...