Technicians in protective clothing check computer read outs next to a pressure valve

Showing distributions of results

If you have multiple values for a set of experimental conditions (more than about 10) then you may wish your reader to understand how they are distributed. The number of points and how you want to show them will affect which type of plot you choose.

If you have multiple values for a set of experimental conditions (more than about 10) then you may wish your reader to understand how they are distributed. The number of points and how you want to show them will affect which type of plot you choose.

Let’s look at an example.

We’ll return to the flow experiment from the previous step. Assume that you have 1000 data values for the flow through pipe 1 (not unusual if you use a data logger).

You could illustrate these on a scatter plot as shown below. This would show the chaotic nature of the readings, but discerning features of the distribution is difficult.

A chaotic scatter plot showing flow rates measured through a pipe. The y-axis is labelled flow in litres per minute, and the x-axis is labelled reading number, running from zero to 1000. There are 1000 data values on the plot, and due to the number it is difficult to discern the features of the distribution.

Figure 1: A scatter plot with 1000 data values.

Rank the data points

Simply by ordering the data points, the reader can more clearly see what the distribution of the readings is. If the data plots as a straight line, then the data is uniformly distributed throughout its range. However, this is unusual.

The plot of Figure 2 is more typical, with curved ends. You can imagine the horizontal axis being stretched until the line is straight. Once this is done, the amount of stretching indicates the statistical distribution that best describes the data, with associated parameters being given by the properties of the straight line.

An ordered scatter plot showing flow rates measured through a pipe. The y-axis is labelled flow in litres per minute, and the x-axis is labelled ranking, running from zero to 1000, with tick marks for 200, 400, 600, 800 and 1000. In this plot the data points are ranked in order so that the distribution of the readings is more clear.

Figure 2: In this scatter plot, the data points have been ranked in order so that the distribution of the readings is clear.

Categorise the data into bins

You could also display the distribution as a histogram. This requires that you analyse the data, counting how many results fall into separate “bins”. In this example, these are the ranges of flow rates shown on the horizontal axis.

A histogram showing flow rates measured through a pipe with the results separated out into different "bins". The y-axis is labelled number of results and runs from zero to 9. The x-axis is labelled flow in litres per minute, and is arranged into eight bins corresponding to the distribution of measurements: 5.6 - 6.0, 6.0 - 6.4, 6.4 - 6.8, 6.8 - 7.2, 7.2 - 7.6, 7.6 - 8.0, 8.0 - 8.4 and 8.4 - 8.8. The bars on the histogram show the most results fall into the 6.8 - 7.2 bin at over 8 results.

Figure 3: In this histogram, the data has been categorised into ‘bins’.

Using a histogram is appropriate if you have chosen a few bins (notice that with only eight bins the labels on the horizontal axis are already quite difficult to read). It is usual to widen the bars of a bar chart being used as a histogram, as shown in Figure 3, to indicate that there are no empty bins between those shown.

Use a line chart

If you wanted to show more details of the distribution, you could use a line chart. Here each point is at the centre of the bin and the continuous line indicates the implied distribution of measurements.

A line chart showing the distribution of measurements of flow through a pipe. The y-axis is labelled fraction of results and runs from 0% to 10%. The x-axis is labelled flow in litres per minute and runs from 5.5 to 8.5. Here each point is at the centre of the bin and a continuous line joins the data points, indicating the implied distribution of measurements.

Figure 4: In this line chart, each point is at the centre of the bin.

In summary…

If you want to illustrate a distribution of results:

  • use a bar chart (histogram) if you have a small number of histogram bins
  • use a scatter plot to show a cumulative distribution or to plot a histogram with a large number of bins