Lost in the Prompt Order: Revealing the Limitations of Causal Attention in Language Models

· 2026 · cs.CL · arXiv 2601.14152

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Large language models exhibit surprising sensitivity to the structure of the prompt, but the mechanisms underlying this sensitivity remain poorly understood. In this work, we conduct an in-depth investigation on a striking case: in multiple-choice question answering, placing context before the questions and options (CQO) outperforms the reverse order (QOC) by over 14%p, consistently over a wide range of models and datasets. Through systematic architectural analysis, we identify causal attention as the core mechanism: in QOC prompts, the causal mask prevents option tokens from attending to context, creating an information bottleneck where context becomes invisible to options.

representative citing papers

PARTREP: Learning What to Repeat for Decoder-only LLMs

cs.CL · 2026-07-02 · conditional · novelty 6.0

PartRep selects high-NLL tokens via a lightweight early-exit gate for partial prompt repetition, retaining most full-repetition gains at 59.4% KV cache and 79% prefill FLOPs on eight benchmarks.

citing papers explorer

Showing 1 of 1 citing paper.

PARTREP: Learning What to Repeat for Decoder-only LLMs cs.CL · 2026-07-02 · conditional · none · ref 1 · internal anchor
PartRep selects high-NLL tokens via a lightweight early-exit gate for partial prompt repetition, retaining most full-repetition gains at 59.4% KV cache and 79% prefill FLOPs on eight benchmarks.

Lost in the Prompt Order: Revealing the Limitations of Causal Attention in Language Models

fields

years

verdicts

representative citing papers

citing papers explorer