0 votes
1 view
in Python by (47.8k points)

I have a fairly large dataset in the form of a dataframe and I was wondering how I would be able to split the data frame into two random samples (80% and 20%) for training and testing.

Thanks!

1 Answer

0 votes
by (107k points)

To create test and train samples from one dataframe with pandas it is recommended to use  numpy's randn:

import numpy as np

import pandas as pd

df = pd.DataFrame(np.random.randn(100, 2))

msk = np.random.rand(len(df)) < 0.8

train = df[msk] 

test = df[~msk]

print(len(test))

print(len(train))

image

Welcome to Intellipaat Community. Get your technical queries answered by top developers !


Categories

...