The limits of GCSE English marking

Published in

The No More Marking Blog

4 min readOct 21, 2019

Take a look at these two essays, which were written in response to the AQA question 5 to write a description of an old person as suggested by a photograph.

One of them is worth 22 marks and was written by a year 10 pupil.

One of them is worth 24 marks and was written by a year 11 pupil.

To put it another way, one of these pieces has benefitted from a year of additional teaching.

Which is it?

To help us, let’s take a look at the mark scheme:

There are 24 marks available for content and organisation, and 16 for technical accuracy.

Starting with communication, the difference between upper and lower level 3 is the difference between ‘consistent’ and ‘general’. I can find lapses in both the above pieces: ‘Nothing could stop it.’ — the weather? What does it mean to stop the weather? Or, for the piece on the right, ‘if you were small enough, I’m pretty sure you wouldn’t find a way out.’ A small enough what?! So, let’s say lower level 3, somewhere between 13 and 15 marks for both.

Now onto technical accuracy. Hmm… I can make a good case for level 3 for both pieces, so I’m now somewhere between 22 and 27 marks for each, but with no further advice in the mark schemes on how to narrow the marks down. I’m left with digging out exemplars or using my own internalised standards.

It is little surprise that Ofqual’s research finds a +/- 5 mark difference for essay marking on a scale of 40 marks, as I could easily justify 22 or 27 for either essay. Ofqual’s research suggests that at this point in the process markers refer to internalised schemas to reach their final decision. Perhaps we could improve the mark scheme? I doubt it. Research suggests that levels of response mark schemes such as this one are as good as it gets for marking open ended writing.

So how did we conclude that one piece is worth 22 marks and the other 24? Simple! We had 20 teachers compare each script to a range of other scripts drawn from a large sample, and we then applied a fancy statistical model to their decisions. The process is called Comparative Judgement. And believe it or not, it’s actually quicker than marking!

The scripts haven’t been chosen at random, either. The year 10 piece is the average you can expect from year 10 in September, while the year 11 piece is the average you can expect from year 11 in September. So how much progress do pupils typically make between year 10 and year 11 on this question? Two marks, which is almost too small to see! According to our model the better piece would be chosen by 6 people out of 10, not a huge difference, but it is a significant one.

Without aggregation and a fancy statistical model you may not be sure your pupils are making any progress on this question between year 10 and year 11.

Next time you get a set of essays back from your pupils and feel like banging your head against the wall, don’t. It’s not that the difference isn’t there, and you’re wasting your time, it’s just that the difference can be hard to see.

Now if you really want to know if there’s a difference, throw away those mark schemes and join us in judging. Oh, and which script is which in the above example? I’ll leave it to you to decide. You have a 62% chance of being correct!

You can sign up and judge anything you like for free at https://www.nomoremarking.com/ or join our national GCSE English project here: https://www.nomoremarking.com/products/age2

The limits of GCSE English marking

Written by Chris Wheadon