Frequency Distribution and Graphical Representation of Data
These simple displays are particularly suitable for exploratory analysis of fairly small sets of data. The basic ideas will be developed with an example.
Data have been obtained on the lives of batteries of a particular type in an industrial application.
Table: Shows the lives of 36 batteries recorded to the nearest tenth of a year.
Table: Battery Lives, years
For these data we choose “stems” which are the main magnitudes. In this case the digit before the decimal point is a reasonable choice: 1,2,3,4,5,6. Now we go through the data and put each “leaf,” in this case the digit after the decimal point, on its corresponding stem. The decimal point is not usually shown. The result can be seen in Table: Stem-and-Leaf Display The number of stems on each leaf can be counted and shown under the heading of Frequency.
Table: Stem-and-Leaf Display
From the list of leaves on each stem we have an immediate visual indication of the relative numbers. We can see whether or not the distribution is approximately symmetrical, and we may get a preliminary indication of whether any particular theoretical distribution may fit the data.
Looking for Top Jobs in Data Science ? This blog post gives you all the information you need!
A box plot, or box-and-whisker plot, is a graphical device for displaying certain characteristics of a frequency distribution. A narrow box extends from the lower quartile to the upper quartile. Thus the length of the box represents the interquartile range, a measure of variability. The median is marked by a line extending across the box. The smallest value in the distribution and the largest value are marked, and each is joined to the box by a straight line, the whisker. Thus, the whiskers represent the full range of the data.
Figure is a box plot for the data of Table: Battery Lives, years on the life of batteries under industrial conditions. The labels, “smallest”, “largest”, “median”, and “quartiles”, are usually omitted.
Box plots are particularly suitable for comparing sets of data, such as before and after modifications were made in the production process. Figure: Comparison of Box Plots shows a comparison of the box plot of Figure: Box Plot for Life of Battery with a box plot for similar data under modified production conditions, both for the same sample size. Although the median has not changed very much, we can see that the sample range and the interquartile range for modified conditions are considerably smaller.
Frequency Graphs of Discrete Data
Table: Frequencies for Numbers of Defectives –
Number of defectives, xi Frequency, fi
These data can be shown graphically in a very simple form because they involve discrete data, as opposed to continuous data, and only a few different values. The variate is discrete in the sense that only certain values are possible: in this case the number of defective items in a group of six must be an integer rather than a fraction. The number of defective items in each group of this example is only 0, 1, or 2. The frequencies of these numbers are shown above. The isolated spikes correspond to the discrete character of the variate.
If the number of different values is very large, it may be desirable to use the grouped frequency approach.
Continuous Data: Grouped Frequency
If the variate is continuous, any value at all in an appropriate range is possible. Between any two possible values, there are an infinite number of other possible values, although measuring devices are not able to distinguish some of them from one another.
Measurements will be recorded to only a certain number of significant figures. Even to this number of figures, there will usually be a large number of possible values. If the number of possible values of the variate is large, too many occur on a table or graph for easy comprehension. We can make the data easier to comprehend by dividing the variate into intervals or classes and counting the frequency of occurrence for each class. This is called the grouped frequency approach.