• Articles
  • Tutorials
  • Interview Questions
  • Webinars

What is Linear Regression in Python? Simple and Multiple Linear Regression

What is Linear Regression in Python? Simple and Multiple Linear Regression

Table of content

Show More

Introduction to Linear Regression in Python

Linear regression is a supervised machine learning algorithm that is used to predict a continuous value based on a set of independent variables. What is regression? Regression is a simple yet powerful technique that can be used to solve a variety of problems, such as predicting house prices, sales figures, and customer behavior.

Here’s an interesting video on What is Linear Regression:

Video Thumbnail

Without much delay, let’s get started.

What is Linear Regression?

As mentioned above, linear regression is a predictive modeling technique. It is used whenever there is a linear relation between the dependent and independent variables.

Y = b0 + b1* x

It is used to estimate exactly how much of y will change when x changes a certain amount.

Linear Regression

As we see in the picture, a flower’s sepal length is mapped onto the x-axis, and the petal length is mapped onto the y-axis. Let us try to understand how the petal length changes with respect to the sepal length with the help of linear regression. Let us have a better understanding of linear regression with another example given below.

Certification in Bigdata Analytics

Example:

Let’s assume there is a telecom network called Neo. Its delivery manager wants to find out if there’s a relationship between the monthly charges of a customer and the tenure of the customer. So, he collects all customer data and implements linear regression by taking monthly charges as the dependent variable and tenure as the independent variable. After implementing the algorithm, what he understands is that there is a relationship between the monthly charges and the tenure of a customer. As the tenure of the customer increases, the monthly charges also increase. Now, the best-fit line helps the delivery manager find out more interesting insights from the data. With this, he can predict the value of y for every new value of x.

Example of Linear Regression

Let us say the tenure of a customer is 45 months, and with the help of the best-fit line, the delivery manager can predict that the customer’s monthly charges would be around $64.

Example of Linear Regression

Similarly, if the tenure of a customer is 69 months, with the help of the best-fit line, the delivery manager can predict that the customer’s monthly charges would be around $110.

Example of Linear Regression

This is how linear regression works. Now, the question is how to find the best-fit line.

Are you interested to learn data science? Check out Data Science Course in Chennai to get a clear understanding.

Linear Regression Line of Best Fit

The line of best fit is nothing but the line that best expresses the relationship between the data points. Let us see how to find the best fit line in linear regression.

This is where the residual concept comes into play that is shown in the image below:

Linear Regression Line of Best Fit

The red lines in the above image denote residual values, which are the differences between the actual values and the predicted values. How does residual help in finding the best-fit line?

To find out the best-fit line, we have something called the residual sum of squares (RSS). In RSS, we take the square of the residuals and sum them up.

RSS

The line with the lowest value of RSS is the best-fit line.

what is linear regression in python

Now, let us see how the coefficient of x influences the relationship between the independent and dependent variables.

Regression Coefficient

In simple linear regression, if the coefficient of x is positive, we can conclude that the relationship between the independent and dependent variables is positive.

Regression Coefficient Positive Relation

Here, if the value of x increases, the value of y also increases.

Now, if the coefficient of x is negative, we can say that the relationship between the independent and dependent variables is negative.

Regression Coefficient Negative Relation

Here, if the value of x increases, the value of y decreases.

Now, let us see how we can apply these concepts to build linear regression models. In the below given Python linear regression examples, we will be building two machine learning models for simple and multiple linear regression. Let’s begin.

Enroll in our Machine Learning Course and master linear regression technique.

Practical Application: Linear Regression with Python’s Scikit-learn

  • Dataset in Focus: Boston Housing Price Records
  • Environment: Python 3 and Jupyter Notebook
  • Library: Pandas
  • Module: Scikit-learn

Dataset Overview

Before diving into the linear regression exercise using Python, it’s crucial to familiarize ourselves with the dataset. We’ll be analyzing the Boston Housing Price Dataset, which comprises 506 entries and 13 attributes, along with a target column. Let’s briefly inspect this dataset.

Let’s take a quick look at the dataset columns:

  • Crim: Crime rate per capita by town
  • Zn: Fraction of residential land allocated for large plots (over 25,000 sq. ft.)
  • Indus: Fraction of non-retail business acres in town
  • Chas: Indicator for Charles River proximity (1 if close; 0 if not)
  • Nox: PPM (parts per 10 million) concentration of nitrogen oxides 
  • Rm: Typical number of rooms in a residence
  • Age: Fraction of homes built before 1940
  • Dis: Average distance to five major Boston workplaces
  • Rad: Proximity index to major highways
  • Tax: Property tax rate (per $10,000)
  • Ptratio: Student-to-teacher ratio in town
  • Black: Value calculated as 1000(Bk – 0.63)^2, where Bk represents the fraction of Black residents in town
  • Lstat: Percentage of the population with lower status
  • Medv: Median price of homes (in $1000s)

In this linear regression tutorial, our objective is to develop two predictive models for housing prices.

Model Development

With a clear understanding of our dataset, let’s proceed to construct our linear regression models in Python.

Univariate Linear Regression in Python

Take ‘lstat’ as independent and ‘medv’ as dependent variables or Using ‘lstat’ as the predictor and ‘medv’ as the response:

Step 1: Initialize the Boston dataset

Step 2: Examine dataset dimensions

Have a glance at the shape

Step 3: Preview predictor and response variables

Step 4: Visualize variable trends

Visualize the change in the variables

Step 5: Segregate data into predictors and responses

Step 6: Partition data for training and testing


Step 7: Review dimensions of training and test datasets

Step 8: Start the model training

Step 9: Extract the y-intercept

Step 10: Extract the regression coefficient

Step 11: Generate predictions


Step 12: Compare with actual values

Step 13: Assess model performance

Evaluate the algorithm

Become a Data Science Architect

Multivariate Linear Regression in Python

Here, consider ‘medv’ as the dependent variable and the rest of the attributes as independent variables or using ‘medv’ as the response and all other attributes as predictors:

Step 1: Initialize the Boston dataset

what is linear regression in python

Step 2: Define response and predictor variables

Step 3: Preview predictor variables

what is linear regression in python

Step 4: Preview response variable

Step 5: Partition data for training and testing

Step 6: Review dimensions of training and test datasets

Step 7: Commence model training

Step 8: Examine chosen model coefficients


Step 9: Contrast predicted with actual values

what is linear regression in python

Step 10: Comparing the predicted value to the actual value:

Step 11: Assess model performance

Learn new Technologies

What Did We Learn?

In this module, we have covered the basics of linear regression in Python, including the best-fit line, the coefficient of x, and how to build simple and multiple linear regression models using sklearn. In the next module, we will discuss logistic regression, which is a type of regression analysis that is used to predict the probability of an event occurring.

About the Author

Principal Data Scientist

Meet Akash, a Principal Data Scientist with expertise in advanced analytics, machine learning, and AI-driven solutions. With a master’s degree from IIT Kanpur, Aakash combines technical knowledge with industry insights to deliver impactful, scalable models for complex business challenges.