What is Normal Distribution?

The bell curve or normal distribution is highly relevant in statistics. Its symmetrical shape and distinctive characteristics make it imperative to portray various forms of data, such as natural occurrences and stock exchanges. The distribution possesses a mean and standard deviation, and it is required for drawing inferences from data, hypothesis testing, and application in machine learning. In this article, we will discuss its key features and importance.

Table of Content

Introduction to Normal Distribution

Normal distribution, or the Gaussian distribution, is a balanced probability distribution that is around its mean. It indicates that values near the mean occur more frequently than values far away from the mean. If you draw the normal distribution, it appears in the form of a “bell curve.”

1. Why is it Important?

The significance of the normal distribution is applicable in a wide range of causes. It has a tendency to occur naturally (because of the central limit theorem), so it is a good model for much real data. It is also the foundation of statistical inference, hypothesis testing, and confidence intervals. Most other statistical distributions either derive from or are connected to the normal distribution.

2. Real-World Application of Normal Distribution

The normal distribution occurs in many fields. For instance, the heights and weights of people, blood pressure, exam scores, measurement errors in science, and stock prices are often approximately normally distributed. Due to its widespread applicability, it is a very important tool in statistics and data analysis.

Key Characteristics of the Normal Distribution

1. Symmetry and the Bell Curve

Many empirical frequency distributions have the following characteristics:

    1. They are approximately symmetrical, and the mode is close to the center of the distribution.
    2. The shape of the distribution can be approximated by a bell: nearly flat on top, then decreasing more quickly, then decreasing more slowly toward the tails of the distribution.

2. Mean, Median and Mode

In a perfectly normal distribution, the mean, median, and mode are equivalent and overlap at the middle of the distribution. The center of gravity is reflected through the mean, the median is the middle value, and the mode is the most frequently occurring value. This condition is unique to the normal distribution.

3. Standard Deviation and Spread

The standard deviation, denoted by σ, measures the spread or dispersion of the data around the mean. A large standard deviation indicates a wider and flatter curve, whereas a small standard deviation indicates a narrower and taller curve. It quantifies how much individual data points typically deviate from the mean.

4. The Empirical Rule (68-95-99.7 Rule)

The empirical rule, also referred to as the 68-95-99.7 rule, provides an easy method to estimate the chance of data occurring within a specified range of values from the mean in a normal distribution:

1. About 68% of the data lies between one standard deviation of the mean (μ ± 1σ).

2. Roughly 95% of data lies within two standard deviations of the mean, μ ± 2σ.

3. About 99.7% of the data lies within three standard deviations of the mean (μ ± 3σ).

This rule is very helpful in understanding how spread out the data is, as well as the probability attached to different values in a normal distribution.

histogram of thickness of metal part
A theoretical distribution that has the stated characteristics and can be used to approximate many empirical distributions was devised more than two hundred years ago. It is called the “normal probability distribution,” or the normal distribution. It is sometimes called the Gaussian distribution.

Normal Probability Density Function

The probability density function (PDF) describes the relative likelihood of observing a given value of a continuous random variable. It’s not a probability itself, but the area under the curve between two points represents the probability of the variable falling within that range. The probability density function for the normal distribution is given by:
probability distribution function for normal

In the formula, μ (mu) is the population mean, σ (sigma) is the population standard deviation, and π (pi) is a mathematical constant approximately equal to 3.14159. The PDF shows that the normal distribution is defined by these two parameters

1. Visual Representation of the PDF

The bell-shaped curve visually represents the PDF. The higher the curve, the more likely values around that point are to occur. The area under the curve is equal to 1, representing the total probability. This density function extends from –∞ to +∞. Its shape is –
shape of the normal distribution

Calculating Probabilities with the Normal Distribution

1. The Concept of Integration

Because the normal distribution is continuous, we can’t just add probabilities as we do with discrete variables. We instead use integration to find the area under the curve, which is the probability. The integral of the PDF between two values (x1 and x2) gives the probability that the random variable X falls between x1 and x2.

probability of x between x1 and x2

2. The Standard Normal Distribution (Z-scores)

To make probability calculations easier, we often use a standard normal distribution. This is a special kind of normal distribution with a mean of 0 and a standard deviation of 1. Any normal variable x is converted into a standard normal variable z through the following formula: z = (x – μ) / σ. The z-score represents how many standard deviations a value is away from the mean. This standardization allows us to use a single table, known as the standard normal table or Z-table, to find probabilities for any normal distribution.

normal distributioncumulative normal probability

3. Using the Normal Distribution Table (Table A1)

Let’s say you want to find the probability that Z is less than or equal to -0.76 (P(Z ≤ -0.76)). Here’s how to use the table:

    1. Find the Row: Look down the left column (z0) until you find the row labeled “-0.7”.

    2. Find the Column: Look across the top row (Δz) until you find the column labeled “.06”.

    3. Find the Intersection: Go to where the “-0.7” row and the “.06” column intersect. The value in that cell, 0.2236, is the probability you’re looking for.

      Therefore, P(Z ≤ -0.76) = 0.2236.

      part of table a1

Get 100% Hike!

Master Most in Demand Skills Now!

4. Using Computer Software (Excel)

Software such as Excel makes the computation of normal probabilities even easier. The function NORMSDIST(z) returns the cumulative probability P(Z ≤ z) for a given z-score. The function NORMSINV(probability) does the opposite: it returns the z-score corresponding to a given cumulative probability.

Fitting the Normal Distribution to Frequency Data

1. Why Fit a Normal Distribution?

Fitting a normal distribution to data allows us to model the data, make predictions, and perform statistical inference. If the data is approximately normally distributed, we can use the properties of the normal distribution to estimate probabilities, identify outliers, and compare different datasets.

2. Fitting to a Continuous Frequency Distribution

In an attempt to fit a normal distribution to continuous frequency data, one needs to calculate the sample mean, x̄ and sample standard deviation s of the data. These values can be used to estimate the population mean, μ and population standard deviation, σ. From these parameters, one can define a normal distribution which best fits the data

3. Fitting to a Discrete Frequency Distribution

When approximating a discrete distribution, such as the binomial, with a continuous distribution, such as the normal, we must make a correction for continuity. The reason is that the continuous normal distribution assigns positive probability to all values, including those between the discrete values of the binomial distribution. The correction is applied by adjusting the boundaries of the intervals used for the normal approximation to account for the discrete nature of the binomial data. For example, if we want to approximate the probability of X = 5 in a binomial distribution, we use the interval 4.5 < X < 5.5 in the normal approximation. This way, the area under the normal curve accurately represents the probability of the discrete outcome.

Normal Approximation to a Binomial Distribution

1. When to Use the Normal Approximation

The normal distribution may be used to approximate the binomial distribution in specific conditions. This is useful because the calculations for binomial become complicated when the sample size is large, that is n. The approximation is suitable generally if both np and n(1-p) are greater than or equal to 5, or some other similar rule of thumb. Here ‘n’ is the number of trials and ‘p’ is the probability of success in each trial.

2 Conditions for Approximation (np and nq)

As mentioned above, both conditions for a good normal approximation to the binomial are that np and n(1-p) (often written as nq where q = 1-p) must be greater than or equal to a number-usually 5 or 10-that way, at least the binomial distribution should look sufficiently bell-shaped and symmetrical to be very well approximated by the normal.

3. Applying the Correction for Continuity

Because the binomial distribution is discrete and the normal distribution is continuous, the correction for continuity is essential when using the normal approximation

Here is an example to understand the same

Assume we are tossing a fair coin 10 times (n = 10, p = 0.5) and want to find the probability of getting exactly 5 heads (X = 5). We can approximate this by using the normal distribution.

    1. Without Correction: If we directly used the normal distribution to approximate P(X = 5), we’d basically be finding the area under the normal curve at the single point 5. The area at a single point is zero for a continuous distribution, which would not make sense for the binomial probability.

    2. With Correction: Continuity correction suggests to use interval about 5. Since binomial variable is discrete (numbers), we can use continuous normal distribution to estimate the probability of occurring between 4.5 and 5.5.

So, we calculate P(4.5 < X < 5.5) using the normal distribution.

After applying the continuity correction, you will then transform the X values to Z-scores using the formula Z = (X – μ) / σ, where μ and σ are the mean and standard deviation of the approximating normal distribution, which depend on the binomial parameters n and p. You can then use the Z-table or software to determine the probabilities.

4. Example for Approximating Binomial Probabilities

Suppose we’re conducting a survey and want to know the probability that at most 60% of 50 randomly selected people support a particular candidate. If we assume that each person’s support is independent and the probability of any individual supporting the candidate is 0.5 (a simplifying assumption for this example), this is a binomial problem with n = 50 trials and a probability of “success” (supporting the candidate) p = 0.5. We want to find P(X ≤ 30), where X is the number of people supporting the candidate (since 60% of 50 is 30).

Directly calculating this binomial probability would involve a lot of computation. Since np = 50 * 0.5 = 25 and n(1-p) = 50 * 0.5 = 25 are both comfortably greater than 5, we can use the normal distribution to approximate.

    1. Calculate Mean and Standard Deviation:

      The mean (μ) and standard deviation (σ) of the binomial distribution are given by:

      • μ = np = 50 * 0.5 = 25
      • σ = sqrt(np(1-p)) = sqrt(50 * 0.5 * 0.5) = sqrt(12.5) ≈ 3.54
    2. Apply the Correction for Continuity:

      Since we want P(X ≤ 30), we adjust the value 30 upward by 0.5 to 30.5. This gives us P(X < 30.5). Remember, we use less than because the normal distribution is continuous.

    3. Calculate the Z-score:

      Convert 30.5 to a z-score using the formula: z = (x – μ) / σ

      z = (30.5 – 25) / 3.54 ≈ 1.56

    4. Find the Probability Using the Z-table:

      We want P(X < 30.5), which is equivalent to P(Z < 1.56).

If you look on the Z-table, we don’t see 1.56. The table only shows values up to -0.0. In a real-world problem, you’d have a full Z-table or use software.

Let’s assume that a full Z-table shows P(Z < 1.56) ≈ 0.9406.

Therefore, P(X ≤ 30) ≈ P(Z < 1.56) ≈ 0.9406

Conclusion:

The approximate probability that at most 60% of the 50 people support the candidate is about 0.9406, or 94.06%.


comparison of a binomial distribution with a normal distribution fitted to it

This figure compares a binomial distribution with a normal distribution. The parameters of the binomial distribution are p = 0.4 and n = 20 (for instance, we might take samples of 20 items from a production line when the probability that any one item will require further processing is 0.4). To fit a normal distribution we need to know the mean and the standard deviation. Remember that the mean of a binomial distribution is μ = np, and that the standard deviation for that distribution is σ = np(1− p).

comparison at n = 10 and p = 0.5

Fitting the Normal Distribution to Cumulative Frequency Data

1. Cumulative Normal Probability

Cumulative probability refers to the probability of a random variable being less than or equal to a certain value. It’s the running total of probabilities as you move along the distribution.

2. Normal Probability Paper

Normal probability paper is designed so that if your data comes from a normal distribution, the plotted points will fall approximately along a straight line. Deviations from a straight line indicate that the data is not perfectly normally distributed.

These deviations reveal important information about how your data differs from a perfectly normal distribution.

2.1. What a Straight Line Indicates

If your data is perfectly normally distributed, the points on the normal probability plot will fall almost exactly along a straight diagonal line. This is because the quantiles of your data will match the quantiles of a standard normal distribution.

2.2 Deviations and Their Interpretations

When the points deviate from a straight line, it suggests that your data is not perfectly normal. Here are some common patterns and their interpretations:

    1. Curves:

      • Concave Upward (Bowing Upward): This often indicates positive skew (a long tail to the right). Your data has more extreme high values than a normal distribution would have.
      • Concave Downward (Bowing Downward): This suggests negative skew (a long tail to the left). Your data has more extreme low values than a normal distribution would have.
    2. S-Shapes:

      • Slight S-shape: A subtle S-shape can indicate that the tails of your distribution are thinner than those of a normal distribution. There are fewer extreme values than expected.
      • Pronounced S-shape: A more pronounced S-shape may suggest a distribution that is platykurtic (flatter tails and thicker center) compared to the normal distribution. It might also indicate a mixture of two distributions.
    3. Other Non-linear Patterns:

      • Irregular or Random Scatter: Some scatter is expected due to random variation. However, if the points deviate substantially and irregularly from the line, it could indicate that the data comes from a distribution that is quite different from normal. It might suggest multiple modes or other complexities.
      • Steps or Grouping: If the data shows distinct steps or groupings of points, it might indicate that the underlying data is discrete rather than continuous, or that the data has been rounded or categorized.

normal probability paper

By carefully examining the patterns on a normal probability plot, you can gain valuable insights into the shape of your data’s distribution and determine whether a normal distribution is an appropriate model.

Transformation of Variables to Give a Normal Distribution

1. Why Transform Data?

Data can sometimes fail to follow normal distributions. Although a lot of the statistical analyses presume normal distributions, transforming your data can put it closer in relation to one if it failed the test to be normally distributed.

2. Common Transformations

If the original distribution was x, forms of the new variable to try include log x, 1/x. The most common transformation for this purpose is replacing x by ln x, log10 x or logarithm of x to any other base. Some common transformations include:

      • Logarithmic transformation (log x, ln x): It is helpful for data that is positively skewed (long tail to the right) and where the variance increases with the mean
      • Reciprocal transformation (1/x): Useful with positively skewed data also.
      • Square root transformation (√x): Useful for data that is moderately skewed and where the variance is proportional to the mean.
      • Arcsine transformation (arcsin √p): This is often applied to proportions or percentages. A transformation should be chosen based on the nature of the data, and it usually requires some experience and a certain amount of guessing. The only way to see if the transformation was successful is by inspecting the transformed data, for example, using histograms or normal probability plots

If the original variable shows a distribution which is not a normal distribution, it is very useful to try to change the variable so that the new form will follow a normal distribution. This strategy is often successful if the original distribution showed a single mode somewhere between the smallest and largest values of the variable, but the original distribution was not symmetrical.
If the original distribution was x, forms of the new variable to try include log x, 1/x. The most common transformation for this purpose is replacing x by ln x, log10 x or logarithm of x to any other base.

Conclusion

The normal distribution is a foundation of statistical analysis, providing a powerful way to model and understand data. Its properties, from symmetry to the empirical rule, make it invaluable across multiple fields. Mastering its applications, including using Z-tables and approximations, is essential for anyone working with data. If you want to dive deeper into statistical concepts, you should definitely explore our comprehensive Data Science Program.

Our Data Science Courses Duration and Fees

Program Name
Start Date
Fees
Cohort starts on 9th Mar 2025
₹69,027
Cohort starts on 2nd Mar 2025
₹69,027
Cohort starts on 16th Feb 2025
₹69,027

About the Author

Principal Data Scientist

Meet Akash, a Principal Data Scientist with expertise in advanced analytics, machine learning, and AI-driven solutions. With a master’s degree from IIT Kanpur, Aakash combines technical knowledge with industry insights to deliver impactful, scalable models for complex business challenges.