Generative-Evaluative Agreement: A Necessary Validity Criterion for LLM-Enabled Adaptive Assessment
The paper introduces Generative-Evaluative Agreement (GEA) as a validity criterion for assessing LLM-enabled adaptive assessments. It measures how well an LLM's scoring function aligns with the skill levels it was designed to evaluate. The study finds that while GEA performs well for certain skills, it struggles with others, suggesting the need for improved assessment rubrics.
- ▪Generative-Evaluative Agreement (GEA) is a new validity criterion for LLM-enabled adaptive assessments.
- ▪The study found that GEA recovers about half of the intended variance in skill levels.
- ▪GEA shows strong performance for syntactically verifiable skills but low performance for design-level skills.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Artificial Intelligence arXiv:2605.19529 (cs) [Submitted on 19 May 2026] Title:Generative-Evaluative Agreement: A Necessary Validity Criterion for LLM-Enabled Adaptive Assessment Authors:Grandee Lee, Yue Wang, Che Yee Lye, Luke Peh View a PDF of the paper titled Generative-Evaluative Agreement: A Necessary Validity Criterion for LLM-Enabled Adaptive Assessment, by Grandee Lee and Yue Wang and Che Yee Lye and Luke Peh View PDF HTML (experimental) Abstract:When the same LLM generates assessment items, simulates student responses, and scores them, the validation loop is self-referential.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.