Thursday, 12 November 2009

Computer says D minus

Much fun was had today with the Computer that Failed Churchill.

They are some of the most memorable and stirring words of the 20th century, but Churchill’s speech exhorting the British to “fight on the beaches” would fail if submitted as a school essay and subjected to a proposed computerised marking system. The wartime leader had a style that was too repetitive, according to the computer being tested for the online marking of school qualifications. It rated Churchill as below average in the equivalent of an A level English exam.

Of course, it isn't entirely fair: the rhythmic structure of the passage, with its reliance on anaphora and strong grounding in the monosyllables of Old English ("might" used as a noun, for example), could scarcely be more unlike the style expected of an English literature exam, which the computer was programmed to recognise. An analysis of Hamlet written in the style of a Churchillian oration would be strange indeed, and almost certainly inappropriate. Even a human examiner might find it lacking. It was more troubling, actually, that the computer had a problem with Hemingway, whose simplicity of sentence construction is a model adaptable to any form of prose.

Be that as it may, the conclusion we were meant to draw was that computers are not yet up to the exacting tasking of marking sophisticated exam scripts. That was the message David Wright, chief executive of the Chartered Institute of Educational Assessors, wanted to impart. British exam boards are currently experimenting with computerised assessment, with enthusiasts claiming that the results can be more accurate than fallible human beings. But of course a machine lacking in consciousness can only ever provide a simulacrum of assessment, and is certain to have unpredicted pitfalls. In the United States, where the practice is apparently widespread already, we're told that teenagers have developed techniques for "schmoozing the computer". But a computer has no personality, and it cannot be schmoozed. Strictly speaking, it can't even be tricked. It blindly does what it is programmed to do.

The prospect of computers marking any examination more sophisticated than a multiple-choice test is unnerving, and rather depressing. Writing an essay is - or at least ought to be - more than making a list of points - it is an argument, a piece of rhetoric, an attempt at persuasion. It assumes the presence of a thinking human being at the receiving end. If an examination script is capable of being marked, properly, by a computer, then it is almost by definition less a demonstration of mastery than a regurgitation, little more, perhaps nothing more, than a memory test. Where the assessors are machines, there will be little or no scope for variety or even intelligence in the answers.

But to a large extent the introduction of computers would be a mere formality - or, at most, the final stage in a process that is already far advanced. There was a time, not that long ago, when marking exams was a skill that required real engagement with the script. Yes, there were guidelines. Yes, there were points that an examinee was expected to make to demonstrate their command of the material and ability to answer the question. But there was considerable leeway. At least in arts subjects, points could be awarded for interesting arguments, striking phraseology, the introduction of relevant facts and information that wasn't strictly on the curriculum - in a word, flair. Say what was expected, and you'd get a B; for an A slightly more was required.

These days, rigid marking schemes don't merely ignore these subtle indicators of deep understanding, they may penalise it. In 2006, Brighton headmaster Richard Cairns complained that his most intelligent candidates were being marked down because their answers were better than required. He instructed GCSE candidates not to "think outside the box" to avoid being penalised for giving unexpected answers. "There are even key words they are looking for in your answers," he told his teenagers. "If you mention these three, four, five key words, you will get the maximum marks available. But you could have an incredibly sophisticated response that doesn't mention those key words and you will be penalised for it"

Mary Beard made a similar point the following year. "I know of at least one A level examiner" she wrote "who has given up because he was forced to mark down candidates who wrote really intelligently about a subject but didn't give the points that were demanded by his 'marking criteria'." She put it down to "our recent mad fixation with formal assessment", which has necessitated more and more exams, all of which have to be marked, tick-box style, by much less experienced examiners. There's also less time to cultivate the art of essay-writing, and a widespread belief that the only skills that matter are those that can be quantified. Such as system is almost designed to render superfluous the human element provided by a careful, skilled examiner - so that the job becomes so mechanical, and offers so little scope for intelligent discretion, that it might as well be done by a computer.

Computerisation more than the logical next step, it is an acknowledgement of reality. As in so many other areas of life, computers can do the same job as human beings, can do it more accurately and at a fraction of the cost. And they can scarcely be accused of bias. It would also be highly appropriate. What are exams for, anyway, if not to demonstrate the candidate's absorption of those skills and pieces of information required to survive in the modern world. Society is now run by computers; and computers, incapable of consciousness or rational thought though they be, are set up in judgement over us. In a society that fetishises consistency, objectivity and predictability of outcome they set the gold standard. An algorithm decides if you can have a mortgage, if your personality is sufficiently in conformity with that required for the job you want to do, whether or not you are at risk of a medical condition, what tempting offers are likely to entice you to shop in Sainsbury's. In the brave new world of databases and ID cards the computer will tell you - and everyone else who wants to know - who you are, and from its decision there may be no appeal. So what could better determine than a computer how well you understand Jane Austen?

(P.S. I ran this article through an online style checker and scored 94% for clarity - not bad.)