What is Descriptive Statistics

The purpose of descriptive statistics is to present a mass of data in a more understandable form. We may summarize the data in numbers as (a) some form of average, or in some cases a proportion, (b) some measure of variability or spread, and (c) quantities such as quartiles or percentiles, which divide the data so that certain percentages of the data are above or below these marks. Furthermore, we may choose to describe the data by various graphical displays or by the bar graphs called histograms, which show the distribution of data among various intervals of the varying quantity.
Looking for Top Jobs in Data Science? This blog post gives you all the information you need!

Central Location

Various “averages” are used to indicate a central value of a set of data. Some of these are referred to as means.

(a) Arithmetic Mean

Of these “averages,” the most common and familiar is the arithmetic mean, defined by
arithmatic mean

(b) Other Means

The geometric mean, logarithmic mean, and harmonic mean are all important in some areas of engineering. The geometric mean is defined as the nth root of the product of n observations:
Geometric Mean:-

geometric mean

The logarithmic mean of two numbers is given by the difference of the natural logarithms of the two numbers, divided by the difference between the numbers. It is used particularly in heat transfer and mass transfer.
Logarithmic mean =
logrithmic mean
The harmonic mean involves inverses—i.e., one divided by each of the quantities. The harmonic mean is the inverse of the arithmetic mean of all the inverses.
Harmonic Mean=
harmonic mean

(c) Median

Another representative quantity, quite different from a mean, is the median. If all the items with which we are concerned are sorted in order of increasing magnitude (size), from the smallest to the largest, then the median is the middle item. Consider the five items: 12, 13, 21, 27, 31. Then 21 is the median. If the number of items is even, the median is given by the arithmetic mean of the two middle items. Consider the six items: 12, 13, 21, 27, 31, 33.
The median is (21 + 27) / 2 = 24. One desirable property of the median is that it is not much affected by outliers.

(d) Mode

If the frequency varies from one item to another, the mode is the value which appears most frequently. In the case of continuous variables the frequency depends upon how many digits are quoted, so the mode is more usefully considered as the midpoint of the class with the largest frequency.

Learn Data Science

Variability or Spread of the Data

(a) Sample Range

One simple measure of variability is the sample range, the difference between the smallest item and the largest item in each sample. For small samples all of the same size, the sample range is a useful quantity. However, it is not a good indicator if the sample size varies, because the sample range tends to increase with increasing sample size. Its other major drawback is that it depends on only two items in each sample, the smallest and the largest, so it does not make use of all the data.
This disadvantage becomes more serious as the sample size increases. Because of its simplicity, the sample range is used frequently in quality control when the sample size is constant; simplicity is particularly desirable in this case so that people do not need much education to apply the test.

(b) Interquartile Range

The interquartile range is the difference between the upper quartile and the lower quartile. It is used fairly frequently as a measure of variability, particularly in the Box Plot. It is used less than some alternatives because it is not related to any of the important theoretical distributions.

(c) Mean Deviation from the Mean

The mean deviation from the mean, defined as –

mean deviation from the mean

(d) Mean Absolute Deviation from the Mean

However, the mean absolute deviation from the mean, defined as –

mean absolute deviation from the mean

Its disadvantage is that it is not simply related to the parameters of theoretical distributions.

(e) Variance 

Variance is defined as –

variance

It is the mean of the squares of the deviations of each measurement from the mean of the population. Since squares of both positive and negative real numbers are always positive, the variance is always positive.

(f) Standard Deviation

The standard deviation is extremely important. It is defined as the square root of the variance:

standard deviation

Thus, it has the same units as the original data and is  a representative of the deviations from the mean.

(g)Coefficient of Variation

 A dimensionless quantity, the coefficient of variation is the ratio between the standard deviation and the mean for the same set of data, expressed as a percentage. This can be either (σ / μ) or (s / x ), whichever is appropriate, multiplied by 100%.

Go through our blog on Grouped Frequencies.

Quartiles, Deciles, Percentiles, and Quantiles

Quartiles, deciles, and percentiles divide a frequency distribution into a number of parts containing equal frequencies. The items are first put into order of increasing magnitude.

  • Quartiles divide the range of values into four parts, each containing one quarter of the values. Again, if an item comes exactly on a dividing line, half of it is counted in the group above and half is counted below.

 

  • Deciles divide into ten parts, each containing one tenth of the total frequency.

 

  • Percentiles divide into a hundred parts, each containing one hundredth of the total frequency.

 

  • Quantile divides a frequency distribution into parts containing stated proportions of a distribution.

Want to be an Expert in Data Science, then enroll for our best Data Science course.

Course Schedule

Name Date Details
Data Science Course 25 Mar 2023(Sat-Sun) Weekend Batch
View Details
Data Science Course 01 Apr 2023(Sat-Sun) Weekend Batch
View Details
Data Science Course 08 Apr 2023(Sat-Sun) Weekend Batch
View Details

Leave a Reply

Your email address will not be published. Required fields are marked *