Linear Regression test data violating training data.Please explain where i went wrong

Question

asked Feb 12, 2020 in Data Science by blackindya (18.4k points)

The dataset which I am using consists of 100 entries indicating the rent price of the house at a different location.

After completing training, I have sent the training data as test data, but when I check the results I am getting an incorrect answer.

X_loc = df[{'area','rooms','location'}]
y_loc = df[:]['price']
X_train, X_test, y_train, y_test = train_test_split(X_loc, y_loc, test_size = 1/3, random_state = 0)
regressor = LinearRegression()
regressor.fit(X_train, y_train) y_pred = regressor.predict(X_train[0:1])

Dataset which I am using is:

price rooms area location
0 0 22000 3 1339 140
1 1 45000 3 1580 72
3 3 72000 3 2310 72
4 4 40000 3 1800 41
5 5 35000 3 2100 57

The expected output of y_predict has to be 220000 but I am getting the result as 290000, how can it be possible?

1 Answer

supriya · Answer 1 · 2020-02-12T13:34:23+0000

As per my knowledge, the model may be nonlinear and you are applying linear regression. That’s the reason you are getting incorrect results. There may be other reasons why linear regression fails. Some of them are mentioned below:

Exploratory variables are not explaining enough of variations in response.
There may exist non-linear associations between the variables, which may be modeled with a linear model.
When there is an interaction between the features then non-linear data appears.
Generalizing Linear Regression is (GLM) Generalized Linear Model which can handle non-linearities by its non-linear link function.
We can use Support vector regression in scikit learn when RBF Kernal or polynomial is in the non-linear model.
If you don’t know the function to be used, an alternative way to analyze data is by applying interactive methods such as the Generalized Linear Mixed Model (GLM) and Generalized Estimating Equations (GEE). If they seem to be complex, try Ridge regression because it can handle multicollinearity which is one of the forms of statistical interactions.

If you like to learn more about Data Science and improve your technical skills visit this data scientist

Linear Regression test data violating training data.Please explain where i went wrong

1 Answer

Related questions

Browse Categories