What is Descriptive Statistics?
The purpose of descriptive statistics is to present a mass of data in a more understandable form. We may summarize the data in numbers as (a) some form of average or in some cases a proportion, (b) some measure of variability or spread, and (c) quantities such as quartiles or percentiles, which divide the data so that certain percentages of the data are above or below these marks.
Furthermore, we may choose to describe the data by various graphical displays or by the bar graphs called histograms, which show the distribution of data among various intervals of varying quantity.
Types of Descriptive Statistics
Let’s get into the part of descriptive statistics types
- Measures of Central Tendency: These statistics illuminate the central position of the data. The principal measures of central tendency consist of:
- Mean: The arithmetic average of all dataset values. It involves summing all values and dividing by the count.
- Median: The middle value when data is sorted. For an even dataset size, the median is the average of the two central values.
- Mode: The most frequently occurring value in the dataset. Modes can be singular, multiple, or nonexistent.
- Measures of Dispersion: These statistics indicate data spread. They offer insights into data variability. Common dispersion metrics comprise:
- Range: The gap between the maximum and minimum dataset values. It provides a basic spread idea but is sensitive to outliers.
- Variance: The average squared deviation between each value and the mean. Higher variance signifies more data variability.
- Standard Deviation: The square root of variance. It presents a more interpretable spread measure.
- Interquartile Range (IQR): The span between the first quartile (25th percentile) and the third quartile (75th percentile). It’s less influenced by outliers.
- Measures of kurtosis and skewness: Skewness gauges distribution asymmetry, while kurtosis measures distribution “tailedness.”
- Skewness: Negative skew denotes a left-tailed distribution; positive skew signifies a right-tailed distribution.
- Kurtosis: It reflects the distribution tail weight compared to a normal distribution.
- Frequency Distribution: A tabular or graphical representation of value or value range occurrences. It provides a visual grasp of data distribution.
- Percentiles and Quartiles: Percentiles indicate the value below which a given data percentage falls. Quartiles divide data into four segments, each accounting for 25% of the data.
- Graphical Representations: Visualizations such as histograms, box plots, and scatter plots depict data distribution, outliers, and variable relationships.
Descriptive statistics play a pivotal role in succinctly summarizing data, uncovering patterns, and gaining dataset insights. However, they don’t delve into causation or prediction; they offer a snapshot of data attributes.
Descriptive Statistics Formulas
Let’s discuss the formulas of descriptive statistics
(a) Sample Range
One simple measure of variability is the sample range, the difference between the smallest item and the largest item in each sample. For small samples all of the same size, the sample range is a useful quantity. However, it is not a good indicator if the sample size varies, because the sample range tends to increase with increasing sample size.
Its other major drawback is that it depends on only two items in each sample, the smallest and the largest, so it does not make use of all the data.
This disadvantage becomes more serious as the sample size increases. Because of its simplicity, the sample range is used frequently in quality control when the sample size is constant; simplicity is particularly desirable in this case so that people do not need much education to apply the test.
(b) Interquartile Range
The interquartile range is the difference between the upper quartile and the lower quartile. It is used fairly frequently as a measure of variability, particularly in the Box Plot. It is used less than some alternatives because it is not related to any of the important theoretical distributions.
(c) Mean Deviation from the Mean
The mean deviation from the mean, is defined as –
(d) Mean Absolute Deviation from the Mean
However, the mean absolute deviation from the mean, defined as –
Its disadvantage is that it is not simply related to the parameters of theoretical distributions.
(e) Variance
Variance is defined as –
It is the mean of the squares of the deviations of each measurement from the mean of the population. Since squares of both positive and negative real numbers are always positive, the variance is always positive.
(f) Standard Deviation
The standard deviation is extremely important. It is defined as the square root of the variance:
Thus, it has the same units as the original data and is a representative of the deviations from the mean.
(g)Coefficient of Variation
A dimensionless quantity, the coefficient of variation is the ratio between the standard deviation and the mean for the same set of data, expressed as a percentage. This can be either (σ / μ) or (s / x ), whichever is appropriate, multiplied by 100%.
(h) Arithmetic Mean
Of these “averages,” the most common and familiar is the arithmetic mean, defined by
(i) Other Means
The geometric mean, logarithmic mean, and harmonic mean are all important in some areas of engineering. The geometric mean is defined as the nth root of the product of n observations:
Geometric Mean:-
The logarithmic mean of two numbers is given by the difference of the natural logarithms of the two numbers, divided by the difference between the numbers. It is used particularly in heat transfer and mass transfer.
Logarithmic mean =
The harmonic mean involves inverses—i.e., one divided by each of the quantities. The harmonic mean is the inverse of the arithmetic mean of all the inverses.
Harmonic Mean=
(j) Median
Another representative quantity, quite different from a mean, is the median. If all the items with which we are concerned are sorted in order of increasing magnitude (size), from the smallest to the largest, then the median is the middle item. Consider the five items: 12, 13, 21, 27, 31. Then 21 is the median.
If the number of items is even, the median is given by the arithmetic mean of the two middle items. Consider the six items: 12, 13, 21, 27, 31, 33.
The median is (21 + 27) / 2 = 24. One desirable property of the median is that it is not much affected by outliers.
(k) Mode
If the frequency varies from one item to another, the mode is the value that appears most frequently. In the case of continuous variables the frequency depends upon how many digits are quoted, so the mode is more usefully considered as the midpoint of the class with the largest frequency.
Applications of Descriptive statistics
- Business and Economics:
- Market Analysis: Descriptive statistics empower enterprises to scrutinize consumer inclinations, buying behaviors, and market shifts by extrapolating insights from surveys and sales records.
- Financial Appraisal: They assist in condensing financial data, encompassing means, medians, and standard deviations of stock prices, interest rates, and economic indicators.
- Performance Assessment: Businesses rely on descriptive statistics to gauge workforce performance, monitor productivity, and gauge key performance benchmarks.
- Healthcare and Medicine:
- Epidemiological Examination: Descriptive statistics aid in dissecting disease prevalence, mortality ratios, and healthcare data, facilitating comprehension of malady diffusion and repercussions.
- Clinical Trials: Researchers employ descriptive statistics to distill patient profiles, treatment outcomes, and adverse occurrences in clinical trial scenarios.
- Healthcare Administration: Medical institutions employ descriptive statistics to evaluate patient demographics, waiting durations, and resource allocation.
- Social Sciences:
- Demographic Insight: Descriptive statistics serve to scrutinize populace demographics, encompassing age distribution, gender ratios, and ethnicity, enabling comprehension of societal shifts.
- Educational Context: They facilitate educators in analyzing student achievements, grading distributions, and attendance trends, allowing for the identification of areas necessitating enhancement.
- Criminal Analysis: Descriptive statistics aid law enforcement agencies in comprehending crime frequencies, offense categories, and geographical distributions.
- Psychology:
- Psychological Surveys: Descriptive statistics play a pivotal role in psychologists’ interpretation of survey data, experimental findings, and observational records related to behavior, emotions, and cognition.
- Personality Traits: Researchers leverage descriptive statistics to analyze scores from personality assessments, thus pinpointing common attributes and variations within cohorts.
- Clinical Psychology: Descriptive statistics are instrumental in summarizing patient information, symptom intensity, and treatment outcomes within clinical settings.
- Environmental Science:
- Environmental Monitoring: Descriptive statistics synthesize data from sensors and monitoring stations, assisting scientists in tracking air quality, water purity, and other ecological variables.
- Climate Study: They support the distillation of temperature trends, precipitation patterns, and climate shift indicators over specified intervals.
- Biodiversity Exploration: Descriptive statistics facilitate the scrutiny of species dispersion, population magnitudes, and ecological trends within varying habitats.
- Education:
- Evaluation of Assessments: Descriptive statistics play a pivotal role in summarizing and deciphering outcomes from standardized evaluations, enabling educators to gauge pedagogical efficacy and curricular adequacy.
- Student Advancement: Educators harness descriptive statistics to chart student progress, pinpoint struggling individuals, and tailor instructional techniques accordingly.
- Program Evaluation: Educational establishments deploy descriptive statistics to appraise the efficacy of educational initiatives and interventions.
Descriptive Statistics Examples
Let’s discuss the examples of descriptive statistics
Example 1: Descriptive statistics are used to summarize and describe the main features of a dataset. Let’s consider a simple example of a class of students and their exam scores:
- Student A: 85
- Student B: 72
- Student C: 90
- Student D: 78
- Student E: 94
Mean (Average) Score: (85 + 72 + 90 + 78 + 94) / 5 = 83.8 Median (Middle Score): Since there’s an odd number of data points, the median is the middle score, which is 85. Mode (Most Common Score): The mode is 85, as it appears twice in the dataset. Range (Difference between Max and Min): 94 – 72 = 22 Standard Deviation: This measures the spread of the scores around the mean. Calculating this gives us a value of approximately 8.29.
Example 2: Let’s consider a dataset representing the monthly incomes of employees in a small company:
- Employee 1: $4500
- Employee 2: $3800
- Employee 3: $5200
- Employee 4: $4100
- Employee 5: $3700
Mean Income: ($4500 + $3800 + $5200 + $4100 + $3700) / 5 = $4260 Median Income: Since the data points are odd, the median is the middle income, which is $4100. Range of Incomes: $5200 – $3700 = $1500 Standard Deviation: After calculating, the standard deviation is approximately $633.21, showing the variation in incomes around the mean.
Example 3: Consider a dataset representing the temperatures (in Celsius) recorded over a week in a particular city:
- Monday: 22°C
- Tuesday: 23°C
- Wednesday: 24°C
- Thursday: 22°C
- Friday: 21°C
- Saturday: 20°C
- Sunday: 25°C
Mean Temperature: (22 + 23 + 24 + 22 + 21 + 20 + 25) / 7 = 22.86°C Median Temperature: Since there’s an odd number of data points, the median is the middle temperature, which is 22°C. Mode of Temperatures: The mode is 22°C since it appears twice. Range of Temperatures: 25°C – 20°C = 5°C Standard Deviation: Upon calculation, the standard deviation is approximately 1.95°C, indicating the variability in temperatures across the week.
These examples illustrate how descriptive statistics provide a clear and concise summary of datasets, helping us understand the central tendency, spread, and distribution of the data.
Our Data Science Courses Duration and Fees
Cohort starts on 19th Jan 2025
₹65,037
Cohort starts on 26th Jan 2025
₹65,037
Cohort starts on 19th Jan 2025
₹65,037