Hello I am using random forest regressor and i got RMSE with train : 0.007 but with the test set it is 0.02 so i am having an over-fitting problem how can i reduce the RMSE of test set to 0.007 too ! and how i can make the model more performance
this is my code :
from sklearn.metrics import mean_squared_error
from sklearn.ensemble import RandomForestRegressor
rf = RandomForestRegressor(n_estimators=200)
rf.fit(X_train, y_train)
y_train_predicted = rf.predict(X_train)
y_test_predicted_pruned_trees = rf.predict(X_test)
mse_train = mean_squared_error(y_train, y_train_predicted, squared = False)
mse_test = mean_squared_error(y_test, y_test_predicted_pruned_trees, squared = False)
print("RF with pruned trees, Train MSE: {} Test MSE: {}".format(mse_train, mse_test))