0 votes
1 view
in Machine Learning by (17.4k points)

I'm kind of new to python. can anyone tell me why we set random state to zero in splitting train and test set.

X_train, X_test, y_train, y_test = 

    train_test_split(X, y, test_size=0.30, random_state=0)

I have seen situations like this where random state is set to one!

X_train, X_test, y_train, y_test = \

    train_test_split(X, y, test_size=0.30, random_state=1)

What is the consequence of this random state in cross validation as well?

1 Answer

0 votes
by (33.2k points)

Random_state can be 0 or 1 or any other integer. It should be the same value if you want to validate your processing over multiple runs of the code. By the way, I have seen random_state=42 used in many official examples of scikit.

the random_state parameter is used for initializing the internal random number generator, which will decide the splitting of data into train and test indices in your case.

If random_state is None or np.random, then a randomly-initialized RandomState object is returned.

If random_state is an integer, then it is used to seed a new RandomState object.

This is to check and validate the data when running the code multiple times. Setting random_state a fixed value will guarantee that the same sequence of random numbers is generated each time you run the code.

Hope this answer helps you! Thus, for more details, studying concepts about Python For Data Science could be beneficial.

Welcome to Intellipaat Community. Get your technical queries answered by top developers !