Cost Function in Machine Learning

A cost function, also known as a loss function or objective function, is a mathematical metric that quantifies the difference between a model’s expected and actual values, and is used to evaluate a model’s performance.

In this blog, we’ll talk about cost functions—why they matter, their different kinds, and the important idea of gradient descent. We’ll also see a practical application of the cost function to improve the overall performance of the machine-learning model. Below are the contents we will go through in this blog post:

Table of content:

Why we use Cost Function?
Types of Cost Functions
What is Gradient Descent?
Cost Function for Linear Regression
Cost Function for Neural Networks
How to Implement Cost Functions in Python
Conclusion
FAQs

Why we use Cost Function?

The main objective of a cost function is to guide the training process of a machine learning model by providing a numerical measure of its errors, which is then minimized to improve model performance.

Let’s consider an example: Suppose we have a dataset that contains the speed and mileage of cars and bicycles, and we need to classify them. If we plot the records using these two parameters, we will get a scatter plot as below:

As you can see, the blue color is for cars, and the green color depicts bicycles. Now how can I carry out classification for given data? The obvious answer would be figuring out a classifier that splits both classes into two. Now consider that, I found three solutions as depicted in the graphs below:

While the accuracy of the three classifiers in the preceding solutions is great, the third solution is the best since it accurately classifies every data point. The best way to sort things is when you put them in the middle, not too close to one thing or the other.

We need a cost function to obtain such results. It helps you determine how much the model mis-predicted by calculating the difference between real and projected values. Not only that, but the cost function is a metric that, after minimization, will help you land the optimal solution.

Types of Cost Functions in Machine Learning

There are mainly three types of cost functions in ML, as below:

Regression cost function
Binary classification cost function
Multi-class classification cost function

Let’s discuss these cost functions one by one.

1. Regression Cost Function

Regression models are similar to the tools we use to make continuous predictions, such as the price of a house, the forecasted temperature, or a person’s likelihood of receiving a loan. As for the “regression cost function,” it’s simply a means of measuring the accuracy of our predictions. The “cost,” or the amount we missed, is determined by comparing our estimate with the actual result. Thus, it assists in evaluating the accuracy of our estimates.

A regression cost function is further classified into three types: mean error, mean square error, and mean absolute error.

1.1. Mean Error

The mean error refers to the average of the errors made in predictions. It calculates the usual discrepancy between expected and actual data. The mean error sums up all the errors and is divided by the total number of observations.

1.2. Mean Squared Error

In a regression, the mean squared error (MSE) is a commonly used metric to determine how well a model predicts continuous outcomes. It’s a figure that indicates the average difference between our expected and actual numbers.

Formula:

1.3. Mean Absolute Error (MAE)

Another technique for determining how inaccurate our predictions are is the mean absolute error (MAE). Unlike mean squared error (MSE), which squares the discrepancies between our estimates and the actual results, MAE simply considers how far off we are, regardless of whether we’re too high or too low.

It’s similar to stating, “Let’s just see how far away our guesses are from the real answers, without worrying about whether we’re overestimating or underestimating.”

Formula:

Mean Absolute Error - Cost Function in Machine Learning - Intellipaat

2. Binary Classification Cost Function

The binary classification cost function is used for classification models that make predictions of categorical values such as binary digits (0 or 1), true or false, boolean values, etc.

The categorical cross-entropy is one of the most commonly used loss functions for classification. The binary cross function is a special case of categorical cross-entropy.

Let’s consider an example and understand cross-entropy in detail. Suppose we have a binary classification problem where we are predicting whether an email is spam (class 1) or not (class 0).

The machine learning model will output a probability for each class:

Output = [P(Not Spam), P(Spam)]

The actual probability distribution for each class is as follows:

Not Spam = [1, 0]
Spam = [0, 1]

During training, if the input email is indeed spam (class Spam), we want the predicted probability distribution to be closer to the actual distribution of spam.

3. Multi-class Classification Cost Function

A multi-class classification cost function is used in classification scenarios where instances are assigned to more than two. Similar to the cost function used in binary classification, cross-entropy or categorical cross-entropy is commonly utilized here.

In multi-class classification, where goal values range from 0 to 1, 2,…, n classes, this cost function is designed to support it. Cross-entropy calculates a score that captures the average difference between the actual and expected probability distributions in multi-class classification tasks.

What is Gradient Descent?

Gradient descent is an optimization process that uses repetitive parameter adjustments to reduce the cost function of a machine learning model. It is frequently used for model training in deep learning and machine learning.

The fundamental principle of gradient descent is to adjust a model’s parameters in a way that minimizes the cost function. This direction is determined by the negative gradient of the cost function for the parameters.

The following steps are used to update the parameters iteratively after it begins with an initial set of values:

1. Determine the gradient of the cost function with each parameter.

2. Update each parameter by taking a small step in the opposite direction of the gradient.

3. Continue the procedure until a predetermined number of iterations is attained until completion.

There are different variants of gradient descent, such as

Batch gradient descent
Stochastic gradient descent
Mini-batch gradient descent

There are differences in how they use the training data and compute and update the gradients. Every variation has benefits and works well with various datasets and optimization problem types.

The gradient descent equation for linear regression is given below:

Cost Function for Linear Regression

In linear regression, to determine the most accurate output that can be obtained for a given parameter, the dependent and independent variables of a given model are represented linearly.

In machine learning, the cost function indicates the locations where the model is undertrained. To maximize the number of places at which the functions cross the regression line, linear regression is employed.

A well-fitting linear regression model appears as follows:

Mathematically, the MSE cost function for linear regression is defined as:

Cost Function for Neural Networks

A neural network is a type of machine learning algorithm that receives various inputs, processes them using a variety of algorithms, and then sums the results of these algorithms to obtain the final result.

The sum of the mistakes in every layer of a neural network will be its cost function. This is accomplished by first determining the mistake at each layer and then adding together each error to determine the overall error.

To find a cost function for a neural network, first find the gradient descent. As the value of the cost function is determined by the difference between the predicted and actual values, the gradient descent will show the error trends.

1. Cost Function Formula for Neural Networks

As per our exploration until now, the cost function in any algorithmic scenario would be the difference between the actual output value and the predicted output value. Mathematically it is depicted as follows in the case of neural networks:

Here’s an explanation of the parameters in the formula:

n: Number of data points in the dataset.
y: Actual value of the dependent variable for the ith data point.
m and b: Parameters of the linear regression model

How to Implement Cost Functions in Python

Here are examples of how you can implement two common cost functions, the mean squared error (MSE) and the binary cross-entropy in Python:

1. Code for Mean Squared Error (MSE) in Python

def mean_squared_error_manual(y_true, y_pred):
    return np.mean((y_true - y_pred) ** 2)

2. Code for Binary Cross Entropy in Python

def binary_cross_entropy(y_true, y_pred):
    y_pred = np.clip(y_pred, 1e-15, 1 - 1e-15)  # Prevent log(0) errors
    return -np.mean(y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred))

Become job-ready with our Machine Learning course that teaches data cleaning, model training, and tools like Scikit-learn and TensorFlow.

Conclusion

In machine learning, the cost function serves as a path for algorithms to develop and learn. This helps in measuring a model’s performance by putting a number on the discrepancy between expected and observed results. This function is essential to training models for generating precise predictions and judgments. Future developments in cost function design will help machine learning models become more accurate and efficient. The development of cost functions will drive innovation and enhance the capabilities of intelligent systems as machine learning applications spread throughout various industries, including technology, healthcare, and finance.If you want to learn more about this technology, then check out our Comprehensive Data Science Course.

Related Blogs	What’s Inside
Machine Learning Hot Technology	Describes machine learning’s role as a leading technology in innovation.
Machine Learning Engineer vs Data Scientist: A Career Comparison	Explores career differences between machine learning engineers and data scientists.
AdaBoost in Machine Learning	Details AdaBoost for improving machine learning algorithm accuracy.
Power of Deep Learning: AlphaGo vs Lee Sedol Case Study	Showcases deep learning’s power through the AlphaGo vs. Lee Sedol case.
Prerequisites for Machine Learning	Details essential prerequisites for learning machine learning effectively.
Q-Learning	Describes Q-Learning for decision-making in reinforcement learning systems.
Machine Learning Python Tutorial	Guides on using Python for developing machine learning solutions.
What is Ridge Regression?	Outlines Ridge Regression for stabilizing machine learning models.
TensorFlow and Its Installation (Windows)	Provides instructions for TensorFlow setup for machine learning on Windows.

FAQs

What is the role of a cost function in training a machine learning model?

A cost function’s role is to measure the discrepancy between expected and actual values, which helps the model make necessary parameter adjustments during training.

What is cost function formula?

A cost function is fundamentally the difference between the actual output value and the predicted output value. It is mathematically represented as:

Cost function (J) = 1/n (Sum of Loss error for ‘n’ examples)

Where, Loss Error = Actual Output – Predicted Output.

How do we minimize the cost function to optimize a machine-learning model?

By modifying model parameters to lower the overall error, we use optimization procedures like gradient descent to minimize the cost function.

Can you explain the difference between a loss function and a cost function in machine learning?

A cost function aggregates these losses for all examples, representing the total model error to be reduced. A loss function assesses error for a single training sample.

What is the significance of the learning rate in training a machine learning model?

The learning rate determines how quickly or slowly a model learns from the data. Choosing the right learning rate is crucial, as it affects the speed of convergence and the quality of the final model.