Explore Courses Blog Tutorials Interview Questions
0 votes
in Data Science by (17.6k points)

I have a pandas data frame with 50k rows. I'm trying to add a new column that is a randomly generated integer from 1 to 5.

If I want 50k random numbers I'd use:

df1['randNumCol'] = random.sample(xrange(50000), len(df1))

but for this I'm not sure how to do it.

Side note in R, I'd do:

sample(1:5, 50000, replace = TRUE)

Any suggestions?

1 Answer

0 votes
by (41.4k points)

You can use np.random.randint that returns random integers from low (inclusive) to high (exclusive).:

import numpy as np

df1['randNumCol'] = np.random.randint(1, 6, df1.shape[0])

# or if the numbers are non-consecutive (albeit slower)

df1['randNumCol'] = np.random.choice([1, 9, 20], df1.shape[0])

Related questions

Browse Categories