The Value of Human Graders

man machine

The appeal of computer grading for the speaking and writing sections of an English test is obvious. Instantaneous results that measure these skills would be useful for schools and English programs in a number of ways.

The trouble is, these results are not yet reliable. Anyone who has ever tried to use Siri or other voice activated software can tell you it’s far from perfect. Is this software, that transcribes our text messages with hilarious inaccuracy, really capable of determining a student’s language ability? Can we trust it with placement and admissions decisions that can have a major impact on a student’s future? Plus, there are advantages to human graders that may not be immediately apparent.

Content Matters in Our Subjective World

You may have noticed that word processing software assigns a “grade level” to your documents. This is accomplished using the  Flesch-Kincaid algorithm which detects vocabulary and sentence structure. Grading software operates on the same principle. Using sentences that hinge on a “therefore” or conditional verb tenses can clock in at a higher level than writing samples using simpler structures, even if the idea expressed in short sentences is more complex.

For instance, the following nonsense sentence taken from Jorge Luis Borges’ story “Tlön, Uqbar, Orbis Tertius” passes a grammar check on the leading word processor:

Upward, beyond the on streaming, it mooned.

Grading software used by test providers is likely more sophisticated than word processors. However, we communicate in a subjective world. The purpose of language is to communicate from one human being to another. At this point, only humans can truly answer questions such as, “Does the response make sense?” and “Is there a complete idea expressed in the response?”

Correcting for Human Error

A common criticism of human graders is that they are bound to vary in their estimation of the level of a test-taker’s speaking or writing. However, there’s a lot that can be done to mitigate the margin of error, almost to the point of nonexistence.

At iTEP, we work with a wide network of trained ESL graders to score the speaking and writing sections of our tests (the multiple choice sections—listening, grammar, and reading—are scored by computer). Our graders are typically active ESL teachers working in the classroom in addition to grading tests for us. Many of them have 10, 20, or even 30+ years of classroom experience and a deep understanding of the psychology behind language learning. In addition to their qualifications, each of our graders is trained on a grading rubric we provide.

Once they begin grading exams for us, our graders do note work in isolation. We regularly conduct norming exercises that show how our graders’ evaluations compare to their peers, enabling them to recalibrate and adjust. Our master grader is in frequent contact with each of our graders, offering feedback and guidance.

A Matter of Priorities

Ultimately, we feel that grading software simply isn’t as reliable as human graders at this point, and without reliability, an English assessment tool has little value. So we do our best to compete against the advantages electronic grading does provide. We’ve condensed our grading turnaround to one business day for IEPs, and iTEP is one of the most affordable English tests for admissions.

Does the grading method play into the choice of English tests you use at your institution? Which factors do you consider most relevant? How would your ideal English test be graded?

Read this article as it originally appeared on the LinkedIn in page of BES Director of Operations & Academic Content Marielle Marquette.

 

Leave a Reply

Your email address will not be published. Required fields are marked *