pith. sign in

MMErroR: A Benchmark for Erroneous Reasoning in Vision-Language Models

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it
abstract

Recent advances in Vision-Language Models (VLMs) have improved performance in multi-modal learning, raising the question of whether these models truly understand the content they process. Crucially, can VLMs detect when a reasoning process is wrong and identify its error type? To answer this, we present MMErroR, a multi-modal benchmark of 1997 samples, each embedding a single coherent reasoning error. These samples span 24 subdomains across six top-level domains, ensuring broad coverage and taxonomic richness. Unlike existing benchmarks that focus on answer correctness, MMErroR targets a process-level, error-centric evaluation that requires models to detect incorrect reasoning and classify the error type within both visual and linguistic contexts. We evaluate 12 representative VLMs, and even the best model, Gemini-3-Pro-Preview, classifies the error correctly in only 66.65\% of cases, underscoring the challenge of identifying erroneous reasoning. Furthermore, the ability to accurately identify errors offers valuable insights into the capabilities of multi-modal models. Project Page: https://mmerror-benchmark.github.io

fields

cs.CV 3 cs.LG 2

years

2026 5

clear filters

representative citing papers

Modality-Decoupled Online Recursive Editing

cs.LG · 2026-05-19 · conditional · novelty 7.0

M-ORE decouples text and visual update statistics in MLLMs and applies recursive low-rank edits in an orthogonal subspace to reduce cross-modal conflict and long-horizon interference.

Neutral-Reference Prompting for Vision-Language Models

cs.CV · 2026-05-15 · unverdicted · novelty 7.0

NeRP corrects asymmetric class confusion in VLMs for unseen classes by combining neutral-prompt priors with sample likelihood to flip predictions on confusable pairs, improving new-class accuracy while preserving base-class performance.

citing papers explorer

Showing 1 of 1 citing paper after filters.

  • Modality-Decoupled Online Recursive Editing cs.LG · 2026-05-19 · conditional · none · ref 15 · internal anchor

    M-ORE decouples text and visual update statistics in MLLMs and applies recursive low-rank edits in an orthogonal subspace to reduce cross-modal conflict and long-horizon interference.