arxiv: 2604.12119 · v1 · submitted 2026-04-13 · 💻 cs.CV · cs.LG

Recognition: unknown

Beyond Perception Errors: Semantic Fixation in Large Vision-Language Models

Md Tanvirul Alam

Authors on Pith no claims yet

Pith reviewed 2026-05-10 15:01 UTC · model grok-4.3

classification 💻 cs.CV cs.LG

keywords semantic fixationvision-language modelsVLMsrule mappingprompt interventionabstract strategy gamesdefamiliarizationactivation steering

0 comments

The pith

Large vision-language models favor familiar semantic rules over explicitly prompted alternatives even on identical visual inputs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that VLMs exhibit semantic fixation by defaulting to standard interpretations of rules in visual strategy tasks despite clear instructions for equally valid inverses. Using a new benchmark with four abstract games, it tests the same terminal board positions under paired rule sets and finds consistently higher accuracy for standard rules across 14 models. Neutral alias prompts reduce this gap while semantically loaded aliases restore it, and post-training on one rule set transfers well only to matching rules. Late-layer activation steering recovers some lost performance on inverse rules, and the pattern holds in external defamiliarization tests.

Core claim

Semantic fixation is the tendency of VLMs to preserve a default interpretation even when the prompt specifies an alternative, equally valid mapping. The VLM-Fix benchmark isolates this by evaluating identical terminal board states under standard and inverse rule formulations in abstract strategy games. Across 14 models accuracy favors standard rules; neutral aliases narrow the inverse gap while loaded aliases reopen it. Post-training on one rule improves same-rule transfer but impairs opposite-rule transfer, and late-layer steering partially edits the error.

What carries the argument

Semantic fixation, the preservation of a default rule interpretation despite a prompt-specified alternative; isolated by comparing accuracy on identical board states under standard versus inverse rules in the VLM-Fix benchmark.

If this is right

Accuracy is higher on standard rules than on inverse rules for the same board states.
Neutral alias prompts substantially reduce the performance gap on inverse rules.
Semantically loaded alias prompts reopen the inverse-rule accuracy gap.
Training on one rule set improves transfer to the same rule but hurts transfer to the opposite rule.
Late-layer activation steering partially recovers performance on inverse-rule tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same fixation mechanism may limit model adaptability when real-world instructions require overriding common priors, such as modified safety rules or novel game variants.
Activation steering or neutral rephrasing could be tested as general tools for improving rule flexibility in other multimodal reasoning settings.
The observed transfer patterns suggest that rule knowledge in VLMs is stored in a form tied to surface semantics rather than abstract mappings.
If the pattern holds in non-game domains, current fine-tuning practices may systematically reduce a model's ability to handle defamiliarized instructions.

Load-bearing premise

The paired standard and inverse rule formulations in the abstract games cleanly separate semantic fixation from perception or prompt-parsing failures.

What would settle it

Finding no consistent accuracy difference between standard and inverse rule versions on the VLM-Fix dataset across multiple models would falsify the claimed semantic-fixation gap.

Figures

Figures reproduced from arXiv: 2604.12119 by Md Tanvirul Alam.

**Figure 2.** Figure 2: Compact Reversi example from VLM-Fix. Left: the three rendering variants, [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Representative Animals example from VLMBias. Left: the Base and Flip images. Right: the corresponding Base and Alias prompts [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: VLM-Fix splits (D1–D3) for Qwen2.5-VL-7B (left) and Qwen2.5-VL-3B [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Synthetic leg-count transfer for Qwen2.5-VL-7B (left) and Qwen2.5-VL [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Layerwise activation-steering results on VLM-Fix for Qwen2.5-VL-7B across Tic [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 7.** Figure 7: Layerwise SFT→Base activation-steering results on VLMBias Animals for Qwen2.5- VL-7B, Qwen2.5-VL-3B, Molmo2-4B, and InternVL3.5-4B (left to right). the target activation toward a matched donor representation. Matching is determined by a lightweight router that predicts the relevant rule/answer bucket before patching. Full protocol details, split construction, and routing definitions are provided in Appendi… view at source ↗

**Figure 8.** Figure 8: Representative VLM-Fix inputs across four games. In each column, the top three [PITH_FULL_IMAGE:figures/full_fig_p015_8.png] view at source ↗

**Figure 9.** Figure 9: Representative examples from the four VLMBias subsets. Top row: Base images. Second row: Flip images. Third row: Base prompts. Fourth row: Alias prompts [PITH_FULL_IMAGE:figures/full_fig_p016_9.png] view at source ↗

**Figure 10.** Figure 10: Additional synthetic glyph examples from the procedurally rendered leg-counting [PITH_FULL_IMAGE:figures/full_fig_p026_10.png] view at source ↗

**Figure 11.** Figure 11: Post-training performance on the three VLM-Fix transfer splits (D1–D3) for [PITH_FULL_IMAGE:figures/full_fig_p026_11.png] view at source ↗

**Figure 12.** Figure 12: Post-training transfer from the synthetic leg-counting dataset for additional [PITH_FULL_IMAGE:figures/full_fig_p026_12.png] view at source ↗

**Figure 13.** Figure 13: Qwen2.5-VL-3B layerwise activation steering on VLM-Fix across Tic-Tac-Toe, [PITH_FULL_IMAGE:figures/full_fig_p027_13.png] view at source ↗

**Figure 14.** Figure 14: Molmo2-4B layerwise activation steering on VLM-Fix across Tic-Tac-Toe, Reversi, [PITH_FULL_IMAGE:figures/full_fig_p027_14.png] view at source ↗

**Figure 15.** Figure 15: Molmo2-8B layerwise activation steering on VLM-Fix across Tic-Tac-Toe, Reversi, [PITH_FULL_IMAGE:figures/full_fig_p027_15.png] view at source ↗

**Figure 16.** Figure 16: InternVL3.5-4B layerwise activation steering on VLM-Fix across Tic-Tac-Toe, [PITH_FULL_IMAGE:figures/full_fig_p028_16.png] view at source ↗

read the original abstract

Large vision-language models (VLMs) often rely on familiar semantic priors, but existing evaluations do not cleanly separate perception failures from rule-mapping failures. We study this behavior as semantic fixation: preserving a default interpretation even when the prompt specifies an alternative, equally valid mapping. To isolate this effect, we introduce VLM-Fix, a controlled benchmark over four abstract strategy games that evaluates identical terminal board states under paired standard and inverse rule formulations. Across 14 open and closed VLMs, accuracy consistently favors standard rules, revealing a robust semantic-fixation gap. Prompt interventions support this mechanism: neutral alias prompts substantially narrow the inverse-rule gap, while semantically loaded aliases reopen it. Post-training is strongly rule-aligned: training on one rule improves same-rule transfer but hurts opposite-rule transfer, while joint-rule training improves broader transfer. To test external validity beyond synthetic games, we evaluate analogous defamiliarization interventions on VLMBias and observe the same qualitative pattern. Finally, late-layer activation steering partially recovers degraded performance, indicating that semantic-fixation errors are at least partly editable in late representations. Project page, code, and dataset available at https://maveryn.github.io/vlm-fix/.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper introduces a clean paired benchmark showing VLMs favor standard rules over inverse ones, but prompt wording differences may still explain part of the gap.

read the letter

Colleague, the main takeaway is that this work documents a consistent accuracy drop when VLMs must apply inverse rules to the same board states they handle fine under standard rules, and the VLM-Fix benchmark is the tool they built to measure it across 14 models. They also test whether neutral aliases shrink the gap, whether training on one rule hurts the other, and whether late-layer steering can recover some performance, plus a quick check on the existing VLMBias set. Those pieces are useful because they move beyond single-prompt bias tests and give a reusable protocol with code and data released. The directional results hold across open and closed models, which adds weight. The soft spot is the one the stress-test note raises: inverse-rule prompts almost certainly contain extra negations and conditional phrasing, so the observed gap could partly reflect harder surface parsing rather than pure semantic fixation. The alias interventions narrow the difference but do not directly match or ablate length, syntax depth, or token distribution, leaving that alternative explanation open. Everything stays inside abstract games, so claims about broader instruction-following reliability rest on the assumption that the same mechanism appears outside this setup. This is worth a serious referee for groups studying VLM reliability or safety, because the benchmark itself is new and the intervention results are concrete. I would send it to review rather than desk-reject, with the expectation that revisions would tighten the linguistic controls.

Referee Report

2 major / 3 minor

Summary. The paper introduces 'semantic fixation' as a failure mode in VLMs where models preserve default semantic priors even when prompts explicitly specify alternative, valid mappings. It presents the VLM-Fix benchmark over four abstract strategy games, evaluating identical terminal board states under paired standard vs. inverse rule formulations across 14 open and closed VLMs. Results show consistent accuracy favoring standard rules; neutral alias prompts narrow the inverse-rule gap while semantically loaded aliases reopen it. Additional experiments examine post-training transfer effects, extension to VLMBias, and partial recovery via late-layer activation steering. Code, dataset, and project page are released.

Significance. If the central isolation of semantic fixation holds, the work identifies a reproducible, intervention-sensitive bias in VLMs' rule interpretation that is distinct from perception or general reasoning failures, with implications for robust deployment in rule-governed tasks. Strengths include the controlled paired-rule design on synthetic games, the qualitative replication on VLMBias, the open release of code/dataset for reproducibility, and the demonstration that both prompt-level and representation-level interventions can modulate the effect.

major comments (2)

[§3] §3 (VLM-Fix benchmark construction): The claim that accuracy differences isolate semantic fixation from parsing or linguistic confounds requires that standard and inverse rule prompts are matched on surface features. Inverse formulations are likely to contain additional negations, conditionals, or non-canonical phrasing, which could independently raise token-level or syntactic processing costs and produce the observed gap even without fixation on priors. The alias-prompt results are consistent with the mechanism but do not include explicit ablations or metrics (e.g., prompt length, parse depth, or lexical complexity) to rule out this alternative.
[§4.2 and §5] §4.2 (prompt interventions) and §5 (post-training): While neutral aliases narrow the inverse-rule gap and loaded aliases reopen it, the paper does not report quantitative controls confirming that the alias prompts preserve equivalent syntactic complexity and token statistics to the original formulations. Without such matching, the intervention evidence remains compatible with a parsing-difficulty account rather than a pure semantic-fixation account.

minor comments (3)

Report exact per-model sample sizes, number of terminal states per game, error bars or confidence intervals, and the statistical tests used for the accuracy-gap claims.
Clarify the precise procedure for generating and verifying that terminal board states are identical and rule-consistent under both standard and inverse formulations.
In the activation-steering experiments, specify the exact layers, steering coefficients, and how the steering vectors were derived.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address the major comments point by point below, agreeing that additional quantitative controls on prompt surface features would strengthen the isolation of semantic fixation. We have incorporated these analyses in the revision.

read point-by-point responses

Referee: [§3] §3 (VLM-Fix benchmark construction): The claim that accuracy differences isolate semantic fixation from parsing or linguistic confounds requires that standard and inverse rule prompts are matched on surface features. Inverse formulations are likely to contain additional negations, conditionals, or non-canonical phrasing, which could independently raise token-level or syntactic processing costs and produce the observed gap even without fixation on priors. The alias-prompt results are consistent with the mechanism but do not include explicit ablations or metrics (e.g., prompt length, parse depth, or lexical complexity) to rule out this alternative.

Authors: We agree that explicit matching on surface features is necessary to isolate semantic fixation from potential parsing costs. The VLM-Fix prompts were intentionally constructed with parallel syntactic structures, comparable sentence lengths, and minimal unnecessary negations or conditionals for the inverse rules. However, we acknowledge that the original manuscript did not report quantitative metrics for these properties. In the revised version we add a dedicated analysis (new Table and paragraph in §3) reporting token counts, number of negations/conditionals, dependency-parse depth, and lexical complexity (type-token ratio and Flesch-Kincaid grade level) for each paired standard/inverse formulation. These metrics confirm close matching, with average differences too small to account for the observed accuracy gaps across 14 models. The neutral-alias results, which alter only semantic content while preserving syntax, provide further evidence against a pure parsing-difficulty explanation. revision: yes
Referee: [§4.2 and §5] §4.2 (prompt interventions) and §5 (post-training): While neutral aliases narrow the inverse-rule gap and loaded aliases reopen it, the paper does not report quantitative controls confirming that the alias prompts preserve equivalent syntactic complexity and token statistics to the original formulations. Without such matching, the intervention evidence remains compatible with a parsing-difficulty account rather than a pure semantic-fixation account.

Authors: We thank the referee for noting this gap in the reported controls. The alias prompts were designed to keep syntactic structure, length, and token statistics as close as possible to the base formulations, with changes limited to the semantic descriptors of the rules. In the revision we add explicit quantitative comparisons (new paragraph and supplementary table in §4.2) of token length, syntactic complexity (average dependency depth), and token-frequency statistics between original and alias prompts. These show high equivalence, supporting that the narrowing or reopening of the fixation gap is driven by semantic content. The same structural consistency holds for the post-training experiments in §5, where rule prompt phrasing is held fixed across training conditions. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical benchmark with no derivations or self-referential predictions

full rationale

The paper introduces VLM-Fix as a controlled empirical benchmark evaluating VLMs on paired standard/inverse rule formulations across abstract games, with results reported as observed accuracy gaps and intervention effects. No equations, fitted parameters, or derivation chains appear in the abstract or described methodology. Claims rest on direct measurements of model behavior rather than any self-definitional mapping, renamed empirical pattern, or load-bearing self-citation that reduces the central result to its own inputs. The work is self-contained against external benchmarks and does not invoke uniqueness theorems or ansatzes from prior author work.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the assumption that the chosen games and rule inversions isolate semantic priors without confounding factors in visual parsing or instruction following; no free parameters or new mathematical entities are introduced.

axioms (1)

domain assumption Models can parse and apply explicit textual rule descriptions to visual board states
Implicit in the benchmark design that differences arise from semantic fixation rather than failure to understand the prompt text.

invented entities (1)

semantic fixation no independent evidence
purpose: Label for the observed preference for default interpretations over prompt-specified alternatives
New descriptive term for the measured behavior; no independent evidence provided beyond the benchmark results.

pith-pipeline@v0.9.0 · 5501 in / 1324 out tokens · 41653 ms · 2026-05-10T15:01:45.741052+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

3 extracted references · 3 canonical work pages · 1 internal anchor

[1]

Qwen3-VL Technical Report

Accessed: 2026-03-21. Rabiul Awal, Saba Ahmadi, Le Zhang, and Aishwarya Agrawal. Vismin: Visual minimal- change understanding. InAdvances in Neural Information Processing Systems (NeurIPS), 2024. Shuai Bai, Yuxuan Cai, Ruizhe Chen, Keqin Chen, Xionghui Chen, Zesen Cheng, Lianghao Deng, Wei Ding, Chang Gao, Chunjiang Ge, et al. Qwen3-vl technical report.ar...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv 2026
[2]

In: CVPR, pp

doi: 10.1109/CVPR52733.2024.01138. Jen-tse Huang, Ruijia Wang, Yiqiao Jin, Yang Song, Esin Durmus, Dale Schuurmans, David Blei, Jacob Steinhardt, and Tatsunori Hashimoto. VisBias: A benchmark for measuring explicit and implicit social biases in vision language models. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing...

work page doi:10.1109/cvpr52733.2024.01138 2024
[3]

Answer with only<label1>or<label2>. Do not add any other text

Solitaire Laboratory. Accessed: 2026-03-23. Jonathan Kim, Anna Podlasek, Kie Shidara, Feng Liu, Ahmed Alaa, and Danilo Bernardo. Limitations of large language models in clinical problem-solving arising from inflexible reasoning.Scientific reports, 15(1):39426, 2025. Kang-il Lee, Minbeom Kim, Seunghyun Yoon, Minsung Kim, Dongryeol Lee, Hyukhun Koh, and Kyo...

work page doi:10.1037/h0093502 2026