Pith Number

pith:FMNA6MFM

pith:2026:FMNA6MFM4LMM675Y5NEUOJSFEW

not attested not anchored not stored refs pending

Quality-Conditioned Agreement in Automated Short Answer Scoring: Mid-Range Degradation and the Impact of Task-Specific Adaptation

Abigail Victoria Gurin Schleifer, Asaf Salman, Beata Beigman Klebanov, Giora Alexandron, Moriah Ariely

AI models for scoring short answers agree well with experts on fully correct and incorrect responses but show major degradation on mid-range ones, with less degradation after more task-specific adaptation.

arxiv:2605.07647 v2 · 2026-05-08 · cs.CL · cs.AI

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{FMNA6MFM4LMM675Y5NEUOJSFEW}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

All AI models perform well on fully correct and fully incorrect responses, but exhibit substantial degradation on mid-range responses. This mid-range degradation is conditioned on task-specific adaptation: It is most severe in few-shot LLMs with few examples and decreases as task-specific data increases, with fine-tuned encoder models performing best.

C2weakest assumption

The ground-truth scores assigned by a single biology education expert accurately capture the nuanced interpretation required for mid-range responses and serve as a stable reference for measuring model agreement.

C3one line summary

AI short-answer scorers show mid-range quality degradation that lessens with more task-specific adaptation, while human agreement stays stable across the quality spectrum.

Receipt and verification

First computed	2026-05-26T01:03:32.698666Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

2b1a0f30ace2d8cf7fb8eb49472645259e0bf7a9a8941c1acc23b2b75bf26860

Aliases

arxiv: 2605.07647 · arxiv_version: 2605.07647v2 · doi: 10.48550/arxiv.2605.07647 · pith_short_12: FMNA6MFM4LMM · pith_short_16: FMNA6MFM4LMM675Y · pith_short_8: FMNA6MFM

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/FMNA6MFM4LMM675Y5NEUOJSFEW \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 2b1a0f30ace2d8cf7fb8eb49472645259e0bf7a9a8941c1acc23b2b75bf26860

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "ba8218a2b62e2d822a7b71b86e934217765edabe2eca7b1d92aa8c5083654b8e",
    "cross_cats_sorted": [
      "cs.AI"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2026-05-08T12:12:01Z",
    "title_canon_sha256": "c0a7bf240daa90e44d55a71fbdb8da623e29af50529a4db4ca867715d76f01e5"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.07647",
    "kind": "arxiv",
    "version": 2
  }
}