Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in GCP by (19.7k points)
recategorized by

I just discovered that the RAND() function, while undocumented, works in BigQuery. I was able to generate a (seemingly) random sample of 10 words from the Shakespeare dataset using:

SELECT word FROM

(SELECT rand() as random,word FROM [publicdata:samples.shakespeare] ORDER BY random)

LIMIT 10

 My question is: Are there any disadvantages to using this approach instead of the HASH() method defined in the "Advanced examples" section of the reference manual? https://developers.google.com/bigquery/query-reference

closed

1 Answer

+1 vote
by (62.9k points)
selected by
 
Best answer

RAND() function can be used to generate a (seemingly) random number. You can use it to fetch a sample of 10 words from the Shakespeare dataset using:

#standardSQL

SELECT word

FROM `publicdata.samples.shakespeare`

WHERE RAND() < 10/164656

or even better:

#standardSQL

SELECT word

FROM `publicdata.samples.shakespeare`

WHERE RAND() < 10/(SELECT COUNT(*) FROM `publicdata.samples.shakespeare`) 

(where 10 is the approximate number of results I want to get, and 164656 the number of rows that table has)

Browse Categories

...