• Articles
  • Tutorials
  • Interview Questions

Understanding Skewness and Kurtosis: Complete Guide

In this blog post, you’ll learn not just their definitions but also how to interpret them in real-world data. We’ll explore types, how to calculate the skewness coefficient, and even excess in kurtosis. By the end of this post, you’ll not only grasp these fundamental concepts but also be equipped to apply them in your data analysis, upskilling your ability to make informed decisions and predictions. 

Table of Contents

If you are a Beginner, then do watch this Data Science Course to have in-depth knowledge about the specialization.

What is a Normal Distribution?

A normal distribution is a way to describe how a set of data is spread out. It’s a type of continuous probability distribution for a random variable. A random variable is something that can have different values based on the outcomes of a random event. For example, if you flip a coin, it can land on heads or tails – a random event.

When you plot the probability of a random event, you get its probability distribution. This shows how likely different outcomes are. For a random variable, this distribution can take on an infinite number of values, forming a continuous curve.

When the continuous probability distribution curve is bell-shaped, i.e., it looks like a hill with a well-defined peak, it’s called a normal distribution. The highest point of the curve (the peak) is where the average value lies. The data is spread out symmetrically on either side of this peak. In a normal distribution, these three measures of central tendency are either equal or very close to each other. The mean is the average, the median is the middle value when the data is sorted, and the mode is the most frequently occurring value.

Normal Distribution 

To visualize it, imagine measuring the heights of all the adults in a small town. Most people will have a height close to the average for the population. There will be fewer very tall or very short people. If you plot these heights on a graph, you’ll likely see a bell-shaped curve, with the peak representing the average height.

To learn more check out Intellipaat’s Data Science Training.

Get 100% Hike!

Master Most in Demand Skills Now !

What is Kurtosis?

In statistics, kurtosis helps us understand this aspect of a dataset. Think of kurtosis as a way to measure the ‘personality’ of your data, focusing on the tails (the extreme values) and the peak (how tall the data piles up in the middle).

Kurtosis is about the tails and the peak:

  • Tails: These are the extreme values at the ends of your data. High kurtosis means more values in the tails.
  • Peak: This is about how tall and sharp the middle of your data is. High kurtosis means a taller, sharper peak.

Positive kurtosis indicates that your data has a lot of extreme values (heavy tails) and a high, narrow peak. On the other hand, negative kurtosis data has fewer extreme values (light tails) and a flatter peak.

When we talk about kurtosis in statistics, we’re focusing on two main aspects of a data distribution: tailedness and peakedness.

Tailedness in kurtosis refers to the frequency and extremity of outliers in a dataset. Outliers are those unusual values that fall far from the majority of the data, while peakedness describes how data values cluster around the mean (average). 

Understanding Excess Kurtosis?

Excess kurtosis is used in statistical and probability theory that helps us compare the kurtosis coefficient {“peakedness” and “tail thickness” of a dataset with a normal (bell-curved) distribution}. Since normal distributions have a kurtosis value of 3, excess kurtosis is calculated by subtracting kurtosis by 3

Excess kurtosis  =  Kurt – 3

Types of Excess Kurtosis

Excess kurtosis can be Positive (Leptokurtic distribution), Near Zero (Mesokurtic distribution), and Negative (Platykurtic distribution)

  • Positive (Leptokurtic) or Heavy-Tailed Distribution: If excess kurtosis is positive, your data has a sharper peak and fatter tails than a normal distribution. It means more extreme values.
  • Near Zero (Mesokurtic): If excess kurtosis is close to zero, your data closely resembles a normal distribution in terms of peakedness and tail thickness.
  • Negative (Platykurtic) or Short-Tailed Distribution: Negative excess kurtosis indicates a flatter peak and thinner tails. This suggests fewer extreme values.
Types of Excess Kurtosis

Positive (Leptokurtic) or Heavy-Tailed Distribution 

Leptokurtic distributions have really long and heavy tails. This means that it’s more common to find unusual values, or outliers, in the data. If the kurtosis value is positive, it tells us that the distribution has a tall peak and the tails at each end are thick. When the kurtosis value is very high, it means that there are a lot of data points in these tails, far away from the average, rather than close to it.

 (Kurtosis > 3)

Near Zero (Mesokurtic)

Mesokurtic distributions are similar to a normal distribution, meaning their kurtosis value is close to 0. In these distributions, the spread of data is moderate, not too wide or narrow, and the peak of the curve is of medium height, not too tall or too flat.

(Kurtosis = 3)

Mesokurtic = 3 – 3 = 0

Negative (Platykurtic) or Short-Tailed Distribution

Platykurtic distributions have tails that aren’t too thick and they spread out more around the middle. This means that most of the data points are not too far from the average. When you compare it to a normal distribution, a platykurtic distribution looks flatter and doesn’t peak as sharply.

(Kurtosis < 3)

What is Skewness?

Skewness is a measure that tells us how much a dataset deviates from a normal distribution, which is a perfectly symmetrical bell-shaped curve. In simpler terms, it shows whether the data points tend to cluster more on one side.

Positive skewness indicates that if the distribution’s tail is longer on the right side, we say the data is positively skewed. This means there are a few unusually high values. While in negative skewness, if the tail is longer on the left side, the data is negatively skewed. This indicates a few unusually low values. 

If the data for a particular characteristic (like age or income) in your study isn’t evenly spread out and leans more towards one end, this can cause problems. Depending on the method you’re using to analyze your data, this unevenness, called skewness, might break some basic rules of that method or make it harder to understand how important this characteristic really is in your study.

In a skewed data set, the most common values are usually between the first quartile (Q1) and the third quartile (Q3).

Understanding skewness is easier when you consider a normal distribution, where data is evenly spread out. The skewness is zero in such a symmetrical distribution because all the central measures, like the mean and median, are exactly in the middle.

Mean = Median = Mode

However, what happens when the distribution isn’t symmetrical? In such cases, that data is called asymmetrical, and this is where the concept of skewness comes into play.

Check out our blog on Data Science tutorials to learn more about it.

Types of Skewness

There are two types of Skewness applied in the field of data analytics that are elaborated further.

Understanding Positively Skewed Distribution

Mean>Median>Mode

Positive skewness means the data stretches out more towards the right side, kind of like a long tail on the right. This type of distribution is called right-skewed. When you measure this skewness, the number you get is bigger than zero. Imagine looking at a graph of this data: the average (mean) value is usually the highest, followed by the middle value (median), and then the most common value (mode).

So why is this happening?

Well, the answer to this is that, the skewness pulls the data distribution towards the right. This makes the average (mean) larger than the middle value (median) and shifts it to the right. Also, the most common value (mode) is found at the peak of the distribution, which is to the left of the median. As a result, in terms of size, it goes like this: mode < median < mean.

Right Skewed

Looking at the boxplot mentioned above, you’ll notice that the second quartile (Q2), which is the median, is closer to the first quartile (Q1). This represents the positive skewed distribution. But what if we have something like this:

Q3 – Q2 > Q2 – Q1

In this situation, it was pretty straightforward to identify the skewness in the data. But what if we come across a scenario like the following:

Positive skewness

In this example, the distances between Q2 and Q1 and Q3 and Q2 are the same, but the distribution still shows positive skewness. Those with a sharp eye will observe that the right whisker (the line extending from the box) is longer than the left. This longer right whisker indicates that the data is positively skewed.

Therefore, the first step should always be to compare the distances between Q2-Q1 and Q3-Q2. If they are equal, the next thing to check is the length of the whiskers.

Understanding Negative Skewed Distribution

As you might have already guessed, a negatively skewed distribution is one where the long tail extends to the left, known as left-skewed. For such distributions, the skewness value is less than zero. As shown in the figure mentioned earlier, in a negatively skewed distribution, the arrangement of central measures follows this pattern: mean < median < mode.

Negative Skewed

In the boxplot, when we’re looking at negative skewness, the way the quartiles relate to each other can be described as follows:

Q3 – Q2 < Q2 – Q1

Similar to what we did earlier, if the difference between Q3 – Q2 and  Q2 – Q1 are equal, then our next step is to check the lengths of the whiskers. If the left whisker is longer than the right one, it’s a sign that the data is negatively skewed.

Negative Skewed

Go through these Data Science Interview Questions and Answers to excel in your interview.

How to Calculate the Skewness Coefficient

Skewness can be calculated through several techniques, with Pearson’s coefficient being the most commonly used method.

Pearson’s first coefficient of Skewness

To figure out the skewness, first find the difference between the average value (mean) and the most common value (mode). Then, divide this difference by the standard deviation, which tells you how spread out the data is.

Pearson’s first coefficient of Skewness

Pearson’s correlation coefficient is a way to measure how two things are related in a straight line, with the value ranging from -1 (they move in opposite directions) to +1 (they move together perfectly). A value of 0 means there’s no straight-line relationship. When we adjust the relationship measure (covariance) by how spread out each thing is (standard deviation), it helps to keep this relationship value between -1 and +1, making it easier to understand.

Now, about Pearson’s first coefficient of skewness, it’s really handy when your data has a clear, most common value (high mode). But if your data doesn’t have a strong most common value, or if it has several, this method might not be the best. That’s where Pearson’s second coefficient of skewness comes in. It’s better in these situations because it doesn’t rely on finding the most common value.

Pearson’s second coefficient of Skewness

For Pearson’s second coefficient of skewness, take the mean and subtract the median, multiply this result by 3, and then divide it by the standard deviation.

Pearson’s second coefficient of Skewness

Rule of thumb:

If the skewness value falls between -0.5 and 0.5, the data is almost symmetrical. When the skewness is between -1 and -0.5 (indicating a negative skew) or between 0.5 and 1 (indicating a positive skew), the data is somewhat skewed. If the skewness is less than -1 (showing a strong negative skew) or more than 1 (showing a strong positive skew), the data is highly skewed.

Conclusion

In this post, we’ve discussed the concepts of kurtosis and skewness extensively. We started with a foundational understanding of a normal distribution – the standard way data can be spread out. Subsequently, we understand kurtosis, learning how it shows whether data has heavy or light tails and gauging the degree of peakedness. We also looked at excess kurtosis, which helps us compare data to a normal distribution. Next, we explored skewness, which tells us if data leans more to one side. We even learned how to calculate the skewness coefficient. All this knowledge is super useful for understanding how data behaves, which is really important in many areas like science, business, and more.

If you have any queries related to this domain, then you can reach out to us at Intellipaat’s Data Science Community!

Frequently Asked Questions

What is the simple difference between kurtosis and skewness?

Skewness measures the asymmetrical nature of a distribution, while kurtosis measures the thickness of a distribution’s tails in comparison to a normal distribution.

What is a normal kurtosis and skewness?

For kurtosis, a value of 3 represents a normal distribution. Excess kurtosis above 3 indicates heavier tails and a sharper peak (leptokurtic), while values below 3 imply lighter tails and a flatter peak (platykurtic).

Why do we calculate kurtosis?

Kurtosis is calculated to understand the shape of a probability distribution. It helps assess the tail and peak of the distribution. High kurtosis indicates heavy tails, meaning extreme values are more likely, while low kurtosis suggests light tails and a more spread-out distribution. Analyzing kurtosis is useful in various fields, such as finance, statistics, and risk assessment, to better comprehend the characteristics of data distribution.

Is kurtosis a measure of shape?

Yes, kurtosis is a measure of the shape of a probability distribution, it specifically assesses the tail and peak of the distribution, providing information about whether the data is heavier. High kurtosis indicates a distribution with heavier tails and a sharper peak, while low kurtosis suggests lighter tails and a flatter peak

What is the shape of a data distribution?

The shape of data distribution can be symmetrical, skewed (left or right), uniform, bimodal (two peaks), or multimodal (more than two peaks). It reflects how the data is spread or clustered, providing its characteristics and techniques.

Course Schedule

Name Date Details
Data Scientist Course 04 May 2024(Sat-Sun) Weekend Batch
View Details
Data Scientist Course 11 May 2024(Sat-Sun) Weekend Batch
View Details
Data Scientist Course 18 May 2024(Sat-Sun) Weekend Batch
View Details

Executive-Post-Graduate-Certification-in-Data-Science-Artificial-Intelligence-IITR.png