pith. sign in
Pith Number

pith:FMNA6MFM

pith:2026:FMNA6MFM4LMM675Y5NEUOJSFEW
not attested not anchored not stored refs pending

Quality-Conditioned Agreement in Automated Short Answer Scoring: Mid-Range Degradation and the Impact of Task-Specific Adaptation

Abigail Victoria Gurin Schleifer, Asaf Salman, Beata Beigman Klebanov, Giora Alexandron, Moriah Ariely

AI models for scoring short answers agree well with experts on fully correct and incorrect responses but show major degradation on mid-range ones, with less degradation after more task-specific adaptation.

arxiv:2605.07647 v2 · 2026-05-08 · cs.CL · cs.AI

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{FMNA6MFM4LMM675Y5NEUOJSFEW}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

All AI models perform well on fully correct and fully incorrect responses, but exhibit substantial degradation on mid-range responses. This mid-range degradation is conditioned on task-specific adaptation: It is most severe in few-shot LLMs with few examples and decreases as task-specific data increases, with fine-tuned encoder models performing best.

C2weakest assumption

The ground-truth scores assigned by a single biology education expert accurately capture the nuanced interpretation required for mid-range responses and serve as a stable reference for measuring model agreement.

C3one line summary

AI short-answer scorers show mid-range quality degradation that lessens with more task-specific adaptation, while human agreement stays stable across the quality spectrum.

Receipt and verification
First computed 2026-05-26T01:03:32.698666Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

2b1a0f30ace2d8cf7fb8eb49472645259e0bf7a9a8941c1acc23b2b75bf26860

Aliases

arxiv: 2605.07647 · arxiv_version: 2605.07647v2 · doi: 10.48550/arxiv.2605.07647 · pith_short_12: FMNA6MFM4LMM · pith_short_16: FMNA6MFM4LMM675Y · pith_short_8: FMNA6MFM
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/FMNA6MFM4LMM675Y5NEUOJSFEW \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 2b1a0f30ace2d8cf7fb8eb49472645259e0bf7a9a8941c1acc23b2b75bf26860
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "ba8218a2b62e2d822a7b71b86e934217765edabe2eca7b1d92aa8c5083654b8e",
    "cross_cats_sorted": [
      "cs.AI"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2026-05-08T12:12:01Z",
    "title_canon_sha256": "c0a7bf240daa90e44d55a71fbdb8da623e29af50529a4db4ca867715d76f01e5"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.07647",
    "kind": "arxiv",
    "version": 2
  }
}