There is an increasing level of belief that with just a bit more computing power, another embedded chip we can fix anything from global poverty to climate change. As the ability for computers to create, collect and analyse large chunks of data and develop more and more advanced models of how our world operates, there is a risk that we become blind to technology’s limitations. And when we become blind to its limitations we increasingly apply it in ways that we shouldn’t.
There are two excellent examples of this currently in the news. The first is the Department of Education’s proposal to use Automated Essay Scoring (AES) to assess the written component of the NAPLAN. The second and related example is the use of algorithms to assess English proficiency as a pre-requisite for either a work visa or permanent residency in Australia.
The argument for the use of these types of technologies normally comes down to two factors. The first, which you most often hear from the supporters of AES and computer scoring, is the use of algorithms in assessments means tests are scored more consistently and it reduces the potential influence from personal bias. The second, which you most often hear from the detractors, is that a computer can scan and score tests much faster than a human can, happily works long hours and weekends and isn’t a member of a union.
Supporters claim that the algorithms consistently score more accurately than humans and that any impact on employment is just the price of progress. But this argument is simplistic and ignores the fundamental disconnect between the task that the human and the computer undertake in scoring an English test. This disconnect is the ability to understand meaning.
The ability to understand and convey meaning is a skill that humans have but technology lacks. AES is effectively a statistical analysis of words, sentences and punctuation. It can use this to ‘indicate’ the author’s grasp of the English language but at no point can the algorithm assess whether what was written (or said) was meaningful or even understandable.
This would appear to be a fatal flaw and has been highlighted quite spectacularly by one of AES’s main detractors. Les Perelman is a former director of writing at MIT who created the Babel Generator, an algorithm designed to create gibberish essays that scored highly on AES software. Take, for example, this piece arguing that college tuition is high because of greedy teaching assistants.
“The average teaching assistant makes six times as much money as college presidents… In addition, they often receive a plethora of extra benefits such as private jets, vacations in the south seas, starring roles in motion pictures.”
The essay was given the top score of 6 out of 6.
But there is another and potentially bigger issue that this highlights. If algorithms are only assessing the indicators of good communication rather than the ability to effectively understand and convey meaning, then the algorithm is also incapable of giving usable feedback. It could suggest you use more big words, create longer sentences, and remember to capitalise proper nouns but this is a somewhat superficial assessment of your ability to communicate.
An example of this is Alice Xu, a childcare worker from China who obtained a master of education in Australia and who speaks fluent English took the test and scored 41 out of a possible 90. A year later, after tutoring she achieved a perfect score. As Alice put it “I didn’t improve my English, I just changed the way I took the test, I did it by learning how the computer worked, I don’t think my English skills or ability improved in any way. This exam is really about your test-taking skills, it’s not about your speaking or language ability.”
This is a classic case of how measuring the things that really matter is difficult. So instead, we take what we can measure and make them the things that matter. It is hard to measure love, good character, happiness and value so instead we measure likes, money, time and sentence length and word count.
Even the proponents of NAPLAN suggest it was never meant to be a ‘high stakes’ assessment but, in the absence of other assessments capable of capturing a broader understanding of what it means to be a good human, it has become one. Schools are narrowing their curriculum and engaging NAPLAN ‘experts’ to help them improve their scores. Students are becoming more stressed and parents are hiring tutors. Whether it was intended or not, NAPLAN has now become a thing that matters. The unintended consequence of this is that children develop to be excellent test takers at the expense of being caring, loving and creative human beings.
This is not to suggest that technology doesn’t have a place in helping assess student outcomes. Firstly, it would seem reasonable to apply computers in the testing of things that are computable. Areas such as mathematics, physics and chemistry often involve discrete answers that are either right or wrong. If the test involved showing how an answer was derived, a good testing algorithm could even point out where the mistake was made and provide direction on how to avoid similar mistakes in the future.
It is even reasonable to employ AES or similar technologies in the assessment of creative, subjective and meaning driven subjects such as English Literature and Art. It is just important that we are conscious of a technology’s limitations and apply it correctly. In the case of AES we need to acknowledge that the computer’s assessment is, and should always be, secondary to what the human understands.
Just as it has in the past, the technology will continue to get better and the arguments will become more persuasive that the technology can do the same job that a human does just faster and cheaper. For some work (the information drive, logical, right/wrong types of work) these claims will be entirely true. In such cases, the best strategy is to embrace the technology and move onto other things. But for other work, work based in purpose, communication, creativity and meaning this will always be a lie, no matter how hard it may be to tell the difference. In these cases it is important that we continue to fight for our humanity, to do the work that matters even when we can’t measure it.
This blog post has been syndicated to Medium. If you’d like to add comments or ideas, head over to this page.