Explore Courses Blog Tutorials Interview Questions
0 votes
in BI by (17.6k points)

I have a dataset that I've loaded into BigQuery, it consists of 3 separate tables with a common identifier in each of the files.

When I set up my project in Tableau I performed an inner join on two of the tables. I set the connection up as an extract and not live.

There's some geo info in my file, lats, and longs. When I drag lat to the rows section on my worksheet it's taking an eternity to perform that task, currently, it's taken 18 mins and counting to just process whatever it's doing when I drag the lat to the row section.

Is there some other way that I can take a random sample of my data for working on it rather than having to wait for each query to process? My data is not even that big, it's around 1M rows.

1 Answer

0 votes
by (47.2k points)
  • I have found Tableau to bog down quite a bit longer before 1 million rows, and I suspect the join compounds the problem for you.

  • Aggregating as much as possible in BigQuery itself, before making the extract. The random excerpt is a good idea, too. You could try:

SELECT * FROM ([subquery joining your tables]) WHERE RAND() < 0.05 # or whatever gives an acceptable sample size

Browse Categories