Pith Number

pith:VQMPFD7E

pith:2026:VQMPFD7EHMUMO6FVYUPLBU6POS

not attested not anchored not stored refs pending

MedVIGIL: Evaluating Trustworthy Medical VLMs Under Broken Visual Evidence

Hanqi Jiang, Haozhen Gong, Hui Ren, Hyeokjae Kwon, Jinglei Lv, Junhao Chen, Lifeng Chen, Lin Zhao, Mingyu Kang, Quanzheng Li, Ruiyu Yan, Tianming Liu, Weihang You, Xiang Li, Yi Pan

Medical vision-language models fail to refuse answers when visual evidence is broken, trailing radiologists by 14 points on a new composite score.

arxiv:2605.07919 v2 · 2026-05-08 · cs.CV

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{VQMPFD7EHMUMO6FVYUPLBU6POS}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

The independent radiologist scores MCS 83.3 at silent-failure rate 5.8%, leaving a 14.1-point composite headroom above the strongest audited model (Claude Opus 4.7 at 69.2).

C2weakest assumption

That the four chosen perturbation types and the 300 clinician-authored cases sufficiently represent the range of broken visual evidence that occurs in real clinical practice.

C3one line summary

MedVIGIL introduces a clinician-supervised benchmark showing medical VLMs frequently give fluent answers on broken visual evidence, with top models 14 points below human radiologists on the composite score.

Receipt and verification

First computed	2026-05-25T02:02:16.508920Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

ac18f28fe43b28c778b5c51eb0d3cf74b15b8cf4701039e8417b656176d45bf0

Aliases

arxiv: 2605.07919 · arxiv_version: 2605.07919v2 · doi: 10.48550/arxiv.2605.07919 · pith_short_12: VQMPFD7EHMUM · pith_short_16: VQMPFD7EHMUMO6FV · pith_short_8: VQMPFD7E

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/VQMPFD7EHMUMO6FVYUPLBU6POS \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: ac18f28fe43b28c778b5c51eb0d3cf74b15b8cf4701039e8417b656176d45bf0

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "133a9adea01e80b03ff81c4967815dba5865719a1437782d91a280f73ba236a3",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by-nc-sa/4.0/",
    "primary_cat": "cs.CV",
    "submitted_at": "2026-05-08T15:55:30Z",
    "title_canon_sha256": "53ac9af92f70e8a5751c4d6922585835fb90690dc00399e153d13ad726421550"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.07919",
    "kind": "arxiv",
    "version": 2
  }
}