MMMU -Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark

· 2025 · DOI 10.18653/v1/2025.acl-long.736

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

open at publisher browse 5 citing papers

representative citing papers

Grounded or Guessing? LVLM Confidence Estimation via Blind-Image Contrastive Ranking

cs.CL · 2026-05-11 · unverdicted · novelty 7.0

BICR uses blind-image contrastive ranking on frozen LVLM hidden states to train a lightweight probe that penalizes confidence on blacked-out inputs, yielding top calibration and discrimination across five models and multiple tasks at low parameter cost.

MemQ: Integrating Q-Learning into Self-Evolving Memory Agents over Provenance DAGs

cs.AI · 2026-05-08 · conditional · novelty 7.0 · 2 refs

MemQ integrates Q-learning with eligibility traces over provenance DAGs to assign credit in self-evolving memory agents, outperforming baselines on all six tested agent benchmarks with largest gains on deep multi-step tasks.

SASAV: Self-Directed Agent for Scientific Analysis and Visualization

cs.GR · 2026-04-03 · unverdicted · novelty 7.0

SASAV introduces the first fully autonomous multi-agent system for scientific data analysis and visualization that operates without external prompting or human-in-the-loop feedback.

Make Each Token Count: Towards Improving Long-Context Performance with KV Cache Eviction

cs.LG · 2026-05-10 · unverdicted · novelty 6.0

A unified learnable KV eviction policy with cross-layer calibration reduces memory and matches or exceeds full-cache performance on long-context tasks by retaining useful tokens and limiting attention dilution.

AICA-Bench: Holistically Examining the Capabilities of VLMs in Affective Image Content Analysis

cs.CV · 2026-04-07 · unverdicted · novelty 6.0

AICA-Bench evaluates 23 VLMs on affective image analysis, identifies weak intensity calibration and shallow descriptions as limitations, and proposes training-free Grounded Affective Tree Prompting to improve performance.

citing papers explorer

Showing 5 of 5 citing papers.

Grounded or Guessing? LVLM Confidence Estimation via Blind-Image Contrastive Ranking cs.CL · 2026-05-11 · unverdicted · none · ref 35
BICR uses blind-image contrastive ranking on frozen LVLM hidden states to train a lightweight probe that penalizes confidence on blacked-out inputs, yielding top calibration and discrimination across five models and multiple tasks at low parameter cost.
MemQ: Integrating Q-Learning into Self-Evolving Memory Agents over Provenance DAGs cs.AI · 2026-05-08 · conditional · none · ref 70 · 2 links
MemQ integrates Q-learning with eligibility traces over provenance DAGs to assign credit in self-evolving memory agents, outperforming baselines on all six tested agent benchmarks with largest gains on deep multi-step tasks.
SASAV: Self-Directed Agent for Scientific Analysis and Visualization cs.GR · 2026-04-03 · unverdicted · none · ref 53
SASAV introduces the first fully autonomous multi-agent system for scientific data analysis and visualization that operates without external prompting or human-in-the-loop feedback.
Make Each Token Count: Towards Improving Long-Context Performance with KV Cache Eviction cs.LG · 2026-05-10 · unverdicted · none · ref 32
A unified learnable KV eviction policy with cross-layer calibration reduces memory and matches or exceeds full-cache performance on long-context tasks by retaining useful tokens and limiting attention dilution.
AICA-Bench: Holistically Examining the Capabilities of VLMs in Affective Image Content Analysis cs.CV · 2026-04-07 · unverdicted · none · ref 50
AICA-Bench evaluates 23 VLMs on affective image analysis, identifies weak intensity calibration and shallow descriptions as limitations, and proposes training-free Grounded Affective Tree Prompting to improve performance.

MMMU -Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark

fields

years

verdicts

representative citing papers

citing papers explorer