Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (18.4k points)

The dataset which I am using consists of 100 entries indicating the rent price of the house at a different location.

After completing training, I have sent the training data as test data, but when I check the results I am getting an incorrect answer.

X_loc = df[{'area','rooms','location'}]

 y_loc = df[:]['price'] 

X_train, X_test, y_train, y_test = train_test_split(X_loc, y_loc, test_size = 1/3, random_state = 0) 

regressor = LinearRegression() 

regressor.fit(X_train, y_train) y_pred = regressor.predict(X_train[0:1])

Dataset which I am using is:

    price rooms  area location 

0 0 22000  3 1339   140

1 1 45000  3 1580   72

3 3 72000  3 2310   72

4 4 40000  3 1800   41

5 5 35000  3 2100   57

The expected output of y_predict has to be 220000 but I am getting the result as 290000, how can it be possible?

1 Answer

0 votes
by (36.8k points)

As per my knowledge, the model may be nonlinear and you are applying linear regression. That’s the reason you are getting incorrect results. There may be other reasons why linear regression fails. Some of them are mentioned below:

  1. Exploratory variables are not explaining enough of variations in response.

  2. There may exist non-linear associations between the variables, which may be modeled with a linear model.

  3. When there is an interaction between the features then non-linear data appears.

  4. Generalizing Linear Regression is (GLM) Generalized Linear Model which can handle non-linearities by its non-linear link function.

  5. We can use Support vector regression in scikit learn when RBF Kernal or polynomial is in the non-linear model. 

  6. If you don’t know the function to be used, an alternative way to analyze data is by applying interactive methods such as the Generalized Linear Mixed Model (GLM) and Generalized Estimating Equations (GEE). If they seem to be complex, try Ridge regression because it can handle multicollinearity which is one of the forms of statistical interactions.

If you like to learn more about Data Science and improve your technical skills visit this data scientist

31k questions

32.8k answers

501 comments

693 users

Browse Categories

...