pith. sign in
Pith Number

pith:VQMPFD7E

pith:2026:VQMPFD7EHMUMO6FVYUPLBU6POS
not attested not anchored not stored refs pending

MedVIGIL: Evaluating Trustworthy Medical VLMs Under Broken Visual Evidence

Hanqi Jiang, Haozhen Gong, Hui Ren, Hyeokjae Kwon, Jinglei Lv, Junhao Chen, Lifeng Chen, Lin Zhao, Mingyu Kang, Quanzheng Li, Ruiyu Yan, Tianming Liu, Weihang You, Xiang Li, Yi Pan

Medical vision-language models fail to refuse answers when visual evidence is broken, trailing radiologists by 14 points on a new composite score.

arxiv:2605.07919 v2 · 2026-05-08 · cs.CV

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{VQMPFD7EHMUMO6FVYUPLBU6POS}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

The independent radiologist scores MCS 83.3 at silent-failure rate 5.8%, leaving a 14.1-point composite headroom above the strongest audited model (Claude Opus 4.7 at 69.2).

C2weakest assumption

That the four chosen perturbation types and the 300 clinician-authored cases sufficiently represent the range of broken visual evidence that occurs in real clinical practice.

C3one line summary

MedVIGIL introduces a clinician-supervised benchmark showing medical VLMs frequently give fluent answers on broken visual evidence, with top models 14 points below human radiologists on the composite score.

Receipt and verification
First computed 2026-05-25T02:02:16.508920Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

ac18f28fe43b28c778b5c51eb0d3cf74b15b8cf4701039e8417b656176d45bf0

Aliases

arxiv: 2605.07919 · arxiv_version: 2605.07919v2 · doi: 10.48550/arxiv.2605.07919 · pith_short_12: VQMPFD7EHMUM · pith_short_16: VQMPFD7EHMUMO6FV · pith_short_8: VQMPFD7E
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/VQMPFD7EHMUMO6FVYUPLBU6POS \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: ac18f28fe43b28c778b5c51eb0d3cf74b15b8cf4701039e8417b656176d45bf0
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "133a9adea01e80b03ff81c4967815dba5865719a1437782d91a280f73ba236a3",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by-nc-sa/4.0/",
    "primary_cat": "cs.CV",
    "submitted_at": "2026-05-08T15:55:30Z",
    "title_canon_sha256": "53ac9af92f70e8a5751c4d6922585835fb90690dc00399e153d13ad726421550"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.07919",
    "kind": "arxiv",
    "version": 2
  }
}