• Articles
  • Tutorials
  • Interview Questions
  • Webinars

Frequency Distributions and Graphical Representations

Frequency Distributions and Graphical Representations of Data

Stem-and-Leaf Displays

These simple displays are particularly suitable for exploratory analysis of fairly small  sets of data. The basic ideas will be developed with an example.
Example
Data have been obtained on the lives of batteries of a particular type in an industrial application.
Table: Shows the lives of 36 batteries recorded to the nearest tenth of a year.

Table: Battery Lives, years

table battery lives, years

For these data we choose “stems” which are the main magnitudes. In this case the digit before the decimal point is a reasonable choice: 1,2,3,4,5,6. Now we go through the data and put each “leaf,” in this case the digit after the decimal point, on its corresponding stem. The decimal point is not usually shown. The result can be seen in Table: Stem-and-Leaf Display The number of stems on each leaf can be counted and shown under the heading of Frequency.

Table: Stem-and-Leaf Display

table stem-and-leaf display

From the list of leaves on each stem we have an immediate visual indication of the relative numbers. We can see whether or not the distribution is approximately symmetrical, and we may get a preliminary indication of whether any particular theoretical distribution may fit the data.

Box Plots

A box plot, or box-and-whisker plot, is a graphical device for displaying certain characteristics of a frequency distribution. A narrow box extends from the lower  quartile to the upper quartile. Thus the length of the box represents the interquartile range, a measure of variability. The median is marked by a line extending across the box. The smallest value in the distribution and the largest value are marked, and each is joined to the box by a straight line, the whisker. Thus, the whiskers represent the full range of the data.

Become a Data Science Architect IBM

Figure is a box plot for the data of Table: Battery Lives, years on the life of batteries under industrial conditions. The labels, “smallest”, “largest”, “median”, and “quartiles”, are usually omitted.

box plot for life of battery

Box plots are particularly suitable for comparing sets of data, such as before and after modifications were made in the production process. Figure: Comparison of Box Plots shows a comparison of the box plot of Figure: Box Plot for Life of Battery with a box plot for similar data under modified production conditions, both for the same sample size. Although the median has not changed very much, we can see that the sample range and the interquartile range for modified conditions are considerably smaller.

comparison of box plots

Frequency Graphs of Discrete Data

Table: Frequencies for Numbers of Defectives – 
Number of defectives, xi                 Frequency, fi
0                                                           48
1                                                            10
2                                                            2
>2                                                           0

These data can be shown graphically in a very simple form because they involve discrete data, as opposed to continuous data, and only a few different values. The variate is discrete in the sense that only certain values are possible: in this case the number of defective items in a group of six must be an integer rather than a fraction. The number of defective items in each group of this example is only 0, 1, or 2. The frequencies of these numbers are shown above. The isolated spikes correspond to the discrete character of the variate.
distribution of numbers of defectives in groups of six items

If the number of different values is very large, it may be desirable to use the grouped frequency approach.

Continuous Data: Grouped Frequency

If the variate is continuous, any value at all in an appropriate range is possible. Between any two possible values, there are an infinite number of other possible values, although measuring devices are not able to distinguish some of them from one another.
Measurements will be recorded to only a certain number of significant figures. Even to this number of figures, there will usually be a large number of possible values. If the number of possible values of the variate is large, too many occur on a table or graph for easy comprehension. We can make the data easier to comprehend by dividing the variate into intervals or classes and counting the frequency of occurrence for each class. This is called the grouped frequency approach.

Master statistics and analytical skills with our Data Analyst Certification and become a certified Data Analyst!

About the Author

Head of Data Engineering & Science

As a head of Data Engineering and Science at Chargebee, Birendra is leading a team of 50+ engineers, specializing in high-scale data and ML platforms. Previously, held key roles at Razorpay and as CTO, with extensive experience in Big Data, ML, and SAAS architecture. Recognized for 85+ contributions to tech publications and patents.