pith:FMNA6MFM
Quality-Conditioned Agreement in Automated Short Answer Scoring: Mid-Range Degradation and the Impact of Task-Specific Adaptation
AI models for scoring short answers agree well with experts on fully correct and incorrect responses but show major degradation on mid-range ones, with less degradation after more task-specific adaptation.
arxiv:2605.07647 v2 · 2026-05-08 · cs.CL · cs.AI
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{FMNA6MFM4LMM675Y5NEUOJSFEW}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
All AI models perform well on fully correct and fully incorrect responses, but exhibit substantial degradation on mid-range responses. This mid-range degradation is conditioned on task-specific adaptation: It is most severe in few-shot LLMs with few examples and decreases as task-specific data increases, with fine-tuned encoder models performing best.
The ground-truth scores assigned by a single biology education expert accurately capture the nuanced interpretation required for mid-range responses and serve as a stable reference for measuring model agreement.
AI short-answer scorers show mid-range quality degradation that lessens with more task-specific adaptation, while human agreement stays stable across the quality spectrum.
Receipt and verification
| First computed | 2026-05-26T01:03:32.698666Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
2b1a0f30ace2d8cf7fb8eb49472645259e0bf7a9a8941c1acc23b2b75bf26860
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/FMNA6MFM4LMM675Y5NEUOJSFEW \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 2b1a0f30ace2d8cf7fb8eb49472645259e0bf7a9a8941c1acc23b2b75bf26860
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "ba8218a2b62e2d822a7b71b86e934217765edabe2eca7b1d92aa8c5083654b8e",
"cross_cats_sorted": [
"cs.AI"
],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.CL",
"submitted_at": "2026-05-08T12:12:01Z",
"title_canon_sha256": "c0a7bf240daa90e44d55a71fbdb8da623e29af50529a4db4ca867715d76f01e5"
},
"schema_version": "1.0",
"source": {
"id": "2605.07647",
"kind": "arxiv",
"version": 2
}
}