Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (17.6k points)

I need to fit RandomForestRegressor from sklearn.ensemble.

forest = ensemble.RandomForestRegressor(**RF_tuned_parameters)

model = forest.fit(train_fold, train_y)

yhat = model.predict(test_fold)

This code always worked until I made some preprocessing of data (train_y). The error message says:

DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().

model = forest.fit(train_fold, train_y)

Previously train_y was a Series, now it's numpy array (it is a column-vector). If I apply train_y.ravel(), then it becomes a row vector and no error message appears, through the prediction step takes very long time (actually it never finishes...).

In the docs of RandomForestRegressor I found that train_y should be defined as y : array-like, shape = [n_samples] or [n_samples, n_outputs] Any idea how to solve this issue?

3 Answers

0 votes
by (41.4k points)

Use this line:

model = forest.fit(train_fold, train_y.values.ravel())

instead of

model = forest.fit(train_fold, train_y)

If you want to learn more about Data Science, visit data science tutorial and data science certification by Intellipaat.

0 votes
by (37.3k points)

To create a SQL query that retrieves the latest date for each username, you can use ‘groupby’ clause along with ‘max’ function to get the desired results. 

Query: 

Select username, max(date) as latest_date, value 

From table_name 

Group by username;  

  • ‘Select username, max(date) as latest_date, value’: selects the username, maximum date for each username, which will give the latest date as output and value columns for the table 

  • ‘From table_name’:  Indicates the table from which to access the data. 

  • ‘Group by username’:  This line helps group the results by username so that max(date) can be for each user. 

0 votes
by (37.3k points)

The reason you are getting a column vector y was passed when a 1D array was expected’ is because of some other implemented methods; your y_train is now not in a one-dimensional array. To convert a multi-dimensional array into a 1D array. We have one method known as flatten().

y_train = y_train.flatten()

Try to use the above code. It will help you to convert your y_train into a 1D array.

1.2k questions

2.7k answers

501 comments

693 users

Browse Categories

...