Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in GCP by (19.7k points)
recategorized by

I just discovered that the RAND() function, while undocumented, works in BigQuery. I was able to generate a (seemingly) random sample of 10 words from the Shakespeare dataset using:

SELECT word FROM

(SELECT rand() as random,word FROM [publicdata:samples.shakespeare] ORDER BY random)

LIMIT 10

 My question is: Are there any disadvantages to using this approach instead of the HASH() method defined in the "Advanced examples" section of the reference manual? https://developers.google.com/bigquery/query-reference

closed

1 Answer

+1 vote
by (62.9k points)
selected by
 
Best answer

RAND() function can be used to generate a (seemingly) random number. You can use it to fetch a sample of 10 words from the Shakespeare dataset using:

#standardSQL

SELECT word

FROM `publicdata.samples.shakespeare`

WHERE RAND() < 10/164656

or even better:

#standardSQL

SELECT word

FROM `publicdata.samples.shakespeare`

WHERE RAND() < 10/(SELECT COUNT(*) FROM `publicdata.samples.shakespeare`) 

(where 10 is the approximate number of results I want to get, and 164656 the number of rows that table has)

Welcome to Intellipaat Community. Get your technical queries answered by top developers!

30.5k questions

32.5k answers

500 comments

108k users

Browse Categories

...