Pith Number

pith:WHD2ZAOC

pith:2025:WHD2ZAOCEHRY3P4PBHP5FF2RDY

not attested not anchored not stored refs resolved

Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?

Jing Liao, Junhao Cheng, Teng Wang, Ying Shan, Yixiao Ge, Yuying Ge

Multimodal models perceive video details but fail to integrate scattered clues, scoring at most 45 percent on a new Holmes-inspired benchmark.

arxiv:2505.21374 v1 · 2025-05-27 · cs.CV

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{WHD2ZAOCEHRY3P4PBHP5FF2RDY}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Our comprehensive evaluation of state-of-the-art MLLMs reveals that, while these models generally excel at visual perception, they encounter substantial difficulties with integrating information and often miss critical clues. For example, the best-performing model, Gemini-2.5-Pro, achieves an accuracy of only 45%, with most models scoring below 40%.

C2weakest assumption

The assumption that the seven manually designed tasks from suspense films accurately require and measure active search, integration, and analysis of multiple clues in a manner comparable to human expert reasoning.

C3one line summary

Video-Holmes benchmark shows top MLLMs achieve at most 45% accuracy on tasks needing integration of multiple clues from suspense films, unlike existing perception-focused tests.

References

51 extracted · 51 resolved · 21 Pith anchors

[1] Chain-of-thought prompting elicits reasoning in large language models 2022

[2] DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models 2024 · arXiv:2402.03300

[3] DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning 2025 · arXiv:2501.12948

[4] Introducing openai o1 2024

[5] OpenAI. Openai o3. 2025. 2, 9 2025

Formal links

3 machine-checked theorem links

Cited by

32 papers in Pith

Learning to Solve, Forgetting to Retain: Correct-Set Turnover in RLVR

AVI-Bench: Toward Human-like Audio-Visual Intelligence of Omni-MLLMs

MOSS-Video-Preview: Toward Real-Time Video Understanding via Cross-Attention

VLMs are Good Teachers for Video Reasoning via Adaptive Test-Time Optimization

SVHalluc: Benchmarking Speech-Vision Hallucination in Audio-Visual Large Language Models

Receipt and verification

First computed	2026-05-17T23:38:14.952213Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

b1c7ac81c221e38dbf8f09dfd297511e28e68d6946a16ac84740f6bd226f0367

Aliases

arxiv: 2505.21374 · arxiv_version: 2505.21374v1 · doi: 10.48550/arxiv.2505.21374 · pith_short_12: WHD2ZAOCEHRY · pith_short_16: WHD2ZAOCEHRY3P4P · pith_short_8: WHD2ZAOC

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/WHD2ZAOCEHRY3P4PBHP5FF2RDY \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: b1c7ac81c221e38dbf8f09dfd297511e28e68d6946a16ac84740f6bd226f0367

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "7d62d4aba317088c9ae2a9712056750f44141128f5c8fcb45341f9e87195b8f1",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.CV",
    "submitted_at": "2025-05-27T16:05:01Z",
    "title_canon_sha256": "1037a1b2b279b5f0742dc6dfa56f6ffc64357cdb3e474d708d8ec7e95ff08200"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2505.21374",
    "kind": "arxiv",
    "version": 1
  }
}