Reporting assessment results with box and whisker plots

Daisy Christodoulou
The No More Marking Blog
3 min readDec 9, 2017

--

The typical way of reporting a set of assessment results looks like this: the percentage of pupils at each grade.

At No More Marking, we provide you with that information, and supplement it with information from other schools, which helps you put it into context.

However, we think this style of reporting is limited. It forces your pupils into three quite arbitrary categories, and doesn’t give you an idea of the full range within each grade. At its worst, as we have written about here, this style of reporting can be really distorting.

We also report results as a box and whisker plot. You can see two of these below, based on the same data as the table above. The box and whisker plot on the left represents the results of just one anonymised school. The one on the right represent the results of all 6,273 pupils in our most recent pilot.

I think this tells you a lot more than just a table of grades.

  • For example, we can see by comparing the two horizontal black lines in each box that this school’s overall average results are slightly lower than the national results — but not by that much!
  • We can see that this school doesn’t have many very low or high performers relative to the national performance — apart from the one outlying pupil right at the bottom.
  • We can see the national grade figures overlaid onto this chart — but we can also see that these are lines that have been superimposed on top of the distribution. Pupils either side of the GDS line are likely to have more in common with each other than with pupils at the other end of their grade.

We can also use the same approach to compare gender and pupil premium. Take a look at the following breakdown by Pupil Premium. We can see that Pupil Premium pupils do about as well as the national Pupil Premium cohort on average, and also that there are almost no really low-attaining PP pupils at this school.

We think the box and whisker plot is an easy way to represent a lot of information without any big distortions. However, they still can’t tell you everything, and one thing they don’t do is to tell you whether the differences between the groups you are comparing are statistically significant. How big do the differences between two groups have to be before they really mean something? In our next post, we will look at error bar plots, which can tell you this.

--

--