
Analysing the impact of AI on assessments in higher education
The Hindu
The silent bias in AI grading: Can a machine really judge creativity and critical thinking?
The growing use of Artificial Intelligence (AI) in education is revolutionising evaluation methods and learning environments. In assessing student work, AI-based grading systems offer objectivity, consistency, and efficiency. From automated essay scoring systems to standardised exams, these technologies assert to be fair and objective assessments. However, the precision of algorithms raises some significant concerns and questions: can AI really evaluate critical thinking and creativity? More importantly, are these systems naturally biased? Do they subtly distort assessments? Is human teacher assessment really free from prejudices? Can human evaluators give equally innovative responses the same marks? Many reports from several Indian institutions have exposed incidents of biased assessments by human evaluators.
Whether the assessment is objective or descriptive, AI performs well to evaluate Engineering and scientific disciplines, particularly when given reference notes for LLMs and probable solution strategies. It can effectively review thousands of student writings, therefore relieving teachers of some of their work and guaranteeing consistent marking. But when assessing subjective work such as essays, literary analysis, or philosophical arguments, AI assessment is not so appropriate since subjectivity allows for several points of view and interpretations. The subjectivity of a student’s answer cannot be constrained by strict criteria or limits.
Critical thinking and creativity do not live by strict rules. For AI, the capacity of a student to offer original viewpoints, participate in sophisticated debate, or use metaphorical and symbolic language is tough to gauge. AI sometimes struggles to understand abstract concepts, humour, irony, and creativity even while it can evaluate structural aspects, coherence, and lexical richness.
Within a limited period, AI can effectively evaluate objective-based criteria for several students. Unlike an objective-type question, a philosophical inquiry such as what is beauty lacks a single, clear response. Rather, it encourages several points of view, all of which could be reasonable. In the same vein, take Alfred Tennyson’s poem, Ulysses, can offer different insights over several readings. Here, AI-assisted evaluation struggles to precisely evaluate the depth, nuance, and originality of subjective answers.
Usually assembled from previously graded papers, AI systems learn from large datasets, which sometimes include prejudices carried on from human assessors. Studies have revealed that graduates of AI could appreciate verbose writing, criticise non-native English speakers, or undervalue unorthodox ideas that deviate from the prevalent trends in the training data.
Sometimes, contextual understanding presents challenges for AI. In literary or philosophical articles, where arguments depend on historical or cultural background, AI’s incapacity to deduce some deeper meanings may lead to erroneous assessments. An AI model taught on Western literature, for instance, might not correctly evaluate a work anchored in Eastern philosophy or indigenous storytelling traditions. However, Retrieval-Augmented Generation (RAG) AI technology can help eliminate false information and increase accuracy,.
One basic question arises: Should AI completely replace human teacher evaluation? Although it can help simplify tests, it is difficult to completely remove human judgement. Teachers contribute a necessary qualitative viewpoint that AI, in some circumstances, lacks. They value uniqueness in ways that robots cannot, know the complexity of arguments, and grasp the change of perspective of a student.













