What is Regression in Machine Learning?

What is Regression in Machine Learning?

Regression in Machine learning is a statistical approach that focuses on modeling the relationship between a dependent variable and one or more independent variables. It is primarily used for predicting continuous outcomes, such as sales forecasts, stock prices, or real estate values.

In this article, we will learn about regression in machine learning and its different algorithms, as well as the terms used in the regression.

Table of Contents:

What is Regression?

Regression is a statistical method that is used to understand and predict the relationship between variables. It helps us to determine how one or more independent variables(inputs) affect the dependent variable(output). For example: we can use regression to predict a person’s weight based on their height.  It is widely used in various fields, such as economics, finance, social sciences, and machine learning, to analyze trends.

Regression in Machine Learning

Regression in machine learning is a supervised learning technique used to model the relationships between variables and one or more independent variables. It finds a relationship between inputs and outputs.  For example, it can predict house prices based on size and location. It’s used for continuous data, not categories. For prediction, multiple regression algorithms are used, but linear regression is the most used algorithm. It tries to predict the fit data with the best-fit plan to draw conclusions.

Importance of Regression

Here are the following importance of regression in Machine Learning:

  1. Continuous target variables: Regression analysis helps us to predict the continuous target variables that represent the numeric values. Eg: predicting house prices, or climate change.
  2. Error Management: Regression analysis can help us to reduce the error between the predicted value and the actual data.
  3. Overfitting and underfitting: Regression models are also handling the overfitting and underfitting.
  4. Model Complexity: Regression models vary from simple linear models to complex data models.
  5. Identifying Patterns: We can Identify patterns by using regression analysis, and relationships between a dataset that can be applied to new data.

Terms Used in Regression Analysis in Machine Learning

Here are some commonly used terms in regression analysis:

  1. Dependent Variable: It is also known as the response variable or outcome variable. It is the variable predicted or explained by the regression model. It is denoted as Y.
  2. Independent Variable: It is referred to as the predictor variable or explanatory variable. It is the variable used to predict or explain the variation in the dependent variable. It is denoted as X.
  3. Outliers: Outliers are values in a data set that are much higher or lower than most other values, so it is mostly avoided during the calculation.
  4. Multicollinearity: If the independent variable is highly correlated with the independent variable, then it is called multicollinearity. It should be avoided because it can affect the ranking of the influence variables.
  5. Overfitting and underfitting: If the algorithm works well with the training data set and throws an error in the test dataset, then it is called overfitting, and if it fails in even training data sets, then it is known as underfitting.

Types of Regression

Here are the following types of Regression:

Types of Regression
  • Linear Regression: Linear regression is a widely used and the most basic form of regression. It assumes a linear relationship between the dependent variable and the independent variables. It aims to fit a line that best represents the data points and predicts the outcome. Simple linear regression involves a single independent variable, while multiple linear regression deals with multiple independent variables.
Linear Regression Formula

Linear regression shows the relationship between independent variables(X-axis) and dependent variables(Y-axis), so it is called linear regression.

  • Logistic regression: Logistic Regression is used when the dependent variable is binary or categorical. It models the probability of an event occurring by fitting a logistic function to the independent variables. The output is a probability score that can be used to classify instances into different classes. It is widely used in classification problems.
  • Polynomial Regression: It is an extension of linear regression. It captures nonlinear relationships between the dependent and independent variables. It fits a polynomial equation of a specified degree to the data. By including polynomial terms, we can create curved lines to better fit the data and capture complex patterns.
Polynomial Regression
  • Time Series Regression: Time series regression deals with data that changes over time, where the dependent variable is influenced by its own past values and other independent variables.
  • Support Vector Regression (SVR): SVR is a type of machine learning algorithm for predicting continuous values. This algorithm is based on the SVM( Support Vector Machine) algorithm. It finds a line that best fits the data within a specified margin of tolerance, instead of focusing on every point. It only focuses on the most important ones, called support vectors. SVR can also handle complex data patterns using kernel functions like RBF or polynomials.
Support Vector Regression (SVR)

Here are some other types of regression as well:

  • Decision Tree Regression: Decision tree Regression is a method to predict continuous values using a tree-like structure. A decision tree is a tree-like structure that consists of nodes and branches. The node is used to represent the decision, the branches represent the outcome of the decision, and the leaf node represents the final result. The goal of this algorithm is to build a tree that can accurately predict the target value for new data points.
Decision Tree Regression
  • Random Forest Regression: It is an advanced version of the decision tree regression that uses multiple decision trees. It builds several trees on random subsets of the data and features, then predicts by the average result value of all the trees. This prevents the data from overfitting and also improves the accuracy compared to a single decision tree.
  • Multiple Regression: Multiple regression is a statistical technique used to analyze the relationship between a dependent variable and two or more independent variables. It extends the concept of simple linear regression, which involves only one independent variable, to a scenario where multiple independent variables are considered simultaneously.

Regularized Linear Regression Techniques

  1. Ridge Regression: Ridge regression is a regularized form of linear regression, that helps to reduce overfitting. Overfitting occurs when if the algorithm works well with the training data set and throws an error in the test dataset
  2. Lasso regression: Like ridge regression, Lasso regression is a regularized linear regression technique and is also used to prevent overfitting. It adds a penalty term to the loss function, that forces the model to use some weights and to set others to zero.

Regression Model Machine Learning

Let’s understand the linear regression with an example. In the below program, we are using the Scikit-learn library and visualize the results with Matplotlib and Seaborn in Python.

#import libraries
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

# Generate some sample data 
np.random.seed(0) 
X = np.random.rand(100, 1) * 10  
y = 2*X + 3 + np.random.randn(100, 1) * 2  

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create and train the linear regression model
model = LinearRegression()
model.fit(X_train, y_train)

# Make predictions on the test set
y_pred = model.predict(X_test)

# Print the coefficients
print("Intercept:", model.intercept_[0])
print("Coefficient:", model.coef_[0][0])


# Plot the regression line and scatter plot using seaborn's regplot
plt.figure(figsize=(8, 6))
sns.regplot(x=X.flatten(), y=y.flatten(), line_kws={"color": "red"}, scatter_kws={'color':'blue'})  
plt.xlabel("X")
plt.ylabel("y")
plt.title("Linear Regression with Seaborn regplot")
plt.grid(True)
plt.show()


#Evaluate the model 
from sklearn.metrics import mean_squared_error, r2_score

# Calculate metrics
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f"Mean Squared Error: {mse:.2f}")
print(f"R-squared: {r2:.2f}")

Output:

Intercept: 3.4126803774228662

Coefficient: 1.9961036404019565
Regression Model Machine Learning

Mean Squared Error: 3.67

R-squared: 0.85

Evaluation Metrics for Regression Models

Here are some of the metrics that are used for evaluating regression models.

  1. Mean Squared Error (MSE): It measures the average squared difference between the predicted values and the actual values of the dependent variable.
  2. Mean Absolute Error (MAE): It calculates the average absolute difference between the predicted and actual values. It measures the average prediction error.
  3. Root Mean Squared Error (RMSE): It is the square root of the MSE, which gives the average difference between predicted and actual values in the original units of the dependent variable.
  4. R-squared (R²):  It ranges from 0 to 1, the higher values indicate a better fit of the model to the data.
  5. Mean Percentage Error (MPE): It calculates the average percentage difference between the predicted and actual values.

Applications of Regression

There are a lot of applications of regression in fields like: Machine Learning, data science, etc.

  1. Sales Forecasting: We can also predict future sales based on historical sales data, seasonality, and other factors.
  2. Predicting Price: Regression can help us to predict the price of a house, based on its size, locality, and other features.
  3. Predict Weather: Climate scientists predict weather with the help of regression analysis.
  4. Stock Prices: In the stock market, people can also predict the trends going on in the stock market, and invest in the stocks accordingly.
  5. Healthcare: Healthcare professionals use regression analysis to predict the patient’s problems and accordingly plan their treatment.
  6. Customer lifetime value prediction: We can also predict the customer lifetime value with the company according to their past purchase history, and behavior.

Conclusion

In conclusion, regression is a supervised machine-learning technique used to predict the relationship between an independent variable and one or more dependent variables.  It is used in various fields such as Machine learning, Data science, Economics, Healthcare, Sales forecasting, etc.

Throughout this article, we have learned so many things like: types of regression, terms used in regression, application of regression, and its importance. If you’re eager to master regression and other essential concepts, join our comprehensive data science course today and take the first step toward becoming an expert!

FAQs – Regression in machine learning

What is the use of regression?

Regression is used in fields like: Machine learning, data science, Economics, Healthcare, Sales forecasting, etc to predict values.

What is Linear Regression?

Linear regression is a widely used and the most basic form of regression. It assumes a linear relationship between the dependent variable and the independent variables

What is regression and classification?

Regression is used to predict continuous values and classification uses categorized data.

Is regression supervised or unsupervised?

Regression is a supervised learning technique.

About the Author

Principal Data Scientist

Meet Akash, a Principal Data Scientist with expertise in advanced analytics, machine learning, and AI-driven solutions. With a master’s degree from IIT Kanpur, Aakash combines technical knowledge with industry insights to deliver impactful, scalable models for complex business challenges.

EPGC Data Science Artificial Intelligence