arxiv: 2604.21139 · v1 · submitted 2026-04-22 · 💻 cs.CL · cs.LG

Recognition: unknown

Slot Machines: How LLMs Keep Track of Multiple Entities

Jack Lindsey, Paul C. Bogdan

Authors on Pith no claims yet

Pith reviewed 2026-05-09 23:44 UTC · model grok-4.3

classification 💻 cs.CL cs.LG

keywords entity trackinglanguage modelsprobing methodsresidual streamentity bindingrelational inferencemulti-entity representation

0 comments

The pith

Language models encode current-entity and prior-entity information in separate orthogonal slots within single token activations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines how language models bind multiple entities to their attributes while processing text. It develops a multi-slot probing method to extract both the currently described entity and the immediately preceding one from the same token's residual stream. These turn out to be stored in largely independent directions that the model deploys for distinct jobs. The current-entity slot handles direct factual questions, while the prior-entity slot supports relational inferences such as tracking what came after whom and spotting conflicts between adjacent descriptions. This division reveals a gap between what information is present in the activations and what the model actually uses during generation.

Core claim

Information about the currently described entity and the immediately preceding one is encoded in separate and largely orthogonal current-entity and prior-entity slots. The current-entity slot is used for explicit factual retrieval, whereas the prior-entity slot supports relational inferences such as entity-level induction and conflict detection between adjacent entities. Only the current-entity slot is consulted for factual questions even when answers are linearly decodable from the prior-entity slot as well. Open-weight models perform near chance on syntax that requires two subject-verb-object bindings on a single token, while recent frontier models succeed at the same task.

What carries the argument

Multi-slot probing that disentangles a single token's residual stream activation into current-entity and prior-entity slots.

If this is right

The prior-entity slot enables relational tasks such as answering who came after a given character in a story.
Factual questions continue to ignore information available in the prior-entity slot.
Syntax that forces two full entity bindings onto one token exceeds the capacity of most current models.
The slot structure offers a substrate for behaviors that require holding two perspectives simultaneously.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Architectures that allow flexible access to both slots at once might improve performance on multi-entity reasoning tasks.
The same separation could be probed in other contexts where models must maintain dual views, such as consistency checking across a dialogue.
Frontier models' success on the double-binding syntax suggests they may have begun to develop additional binding mechanisms beyond the two-slot pattern.

Load-bearing premise

The probing method isolates information the model actually uses rather than directions that merely happen to align with entity distinctions in the chosen datasets.

What would settle it

An experiment in which intervening on the prior-entity slot changes accuracy on explicit factual retrieval questions, or in which open-weight models succeed at double subject-verb-object syntax while frontier models fail.

read the original abstract

Language models must bind entities to the attributes they possess and maintain several such binding relationships within a context. We study how multiple entities are represented across token positions and whether single tokens can carry bindings for more than one entity. We introduce a multi-slot probing approach that disentangles a single token's residual stream activation to recover information about both the currently described entity and the immediately preceding one. These two kinds of information are encoded in separate and largely orthogonal "current-entity" and "prior-entity" slots. We analyze the functional roles of these slots and find that they serve different purposes. In tandem with the current-entity slot, the prior-entity slot supports relational inferences, such as entity-level induction ("who came after Alice in the story?") and conflict detection between adjacent entities. However, only the current-entity slot is used for explicit factual retrieval questions ("Is anyone in the story tall?" "What is the tall entity's name?") despite these answers being linearly decodable from the prior-entity slot too. Consistent with this limitation, open-weight models perform near chance accuracy at processing syntax that forces two subject-verb-object bindings on a single token (e.g., "Alice prepares and Bob consumes food.") Interestingly, recent frontier models can parse this properly, suggesting they may have developed more sophisticated binding strategies. Overall, our results expose a gap between information that is available in activations and information the model actually uses, and suggest that the current/prior-entity slot structure is a natural substrate for behaviors that require holding two perspectives at once, such as sycophancy and deception.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Transformers pack orthogonal current-entity and prior-entity directions into single activations, with models drawing facts only from the current one.

read the letter

The main point is that a single token activation can hold two separate, mostly orthogonal pieces of entity information: one for the entity being described right now and one for the one mentioned immediately before. The authors separate these with a multi-slot probe and show that factual retrieval questions pull from the current slot even when the answers sit in the prior slot too. Relational tasks and conflict detection can use the prior slot, and frontier models handle syntax that requires two bindings on one token better than smaller open models do.

Referee Report

3 major / 3 minor

Summary. The paper introduces a multi-slot probing method to disentangle residual stream activations at individual tokens into two largely orthogonal directions: a 'current-entity' slot carrying information about the entity being described at that position and a 'prior-entity' slot for the immediately preceding entity. Through probing and behavioral experiments on held-out inputs, it claims these slots serve distinct functional roles—current-entity for explicit factual retrieval questions, and both slots together for relational inferences such as entity induction and conflict detection—while also explaining why most models fail at double-binding syntax (e.g., 'Alice prepares and Bob consumes food') but frontier models succeed. The work highlights a gap between linearly decodable information and information the model functionally uses.

Significance. If the central claims hold, the results offer a concrete mechanistic account of entity tracking and binding in transformers, with direct relevance to understanding limitations in multi-entity reasoning, relational inference, and phenomena such as sycophancy. The distinction between availability and functional use of information is a valuable framing, and the observation that recent frontier models handle double-binding syntax better suggests an evolving capacity that could be tracked over model generations. The probing approach itself may generalize to other binding problems.

major comments (3)

[§5] §5 (functional roles experiments): The claim that 'only the current-entity slot is used for explicit factual retrieval questions' despite linear decodability from the prior slot rests on higher probe accuracy for the current direction and near-chance model performance on double-binding syntax. This is correlational; without an intervention that selectively perturbs or ablates the prior-entity direction (while preserving the current direction) and demonstrates no change in factual retrieval accuracy, the functional non-use conclusion remains unestablished.
[§3] §3 (multi-slot probing method): The orthogonality and separation of current- and prior-entity directions are demonstrated via linear probes, but the manuscript does not report controls for whether these directions generalize beyond the specific entity-attribute datasets used or whether they capture functional routing rather than dataset-specific correlations. Additional cross-dataset probe transfer results or synthetic controls would be needed to support the 'slots' interpretation.
[§4.3] §4.3 (double-binding syntax tests): The near-chance performance on constructions forcing two SVO bindings on one token is presented as consistent with the slot limitation, but the paper does not quantify how much of the failure is attributable to the prior slot being inaccessible versus other factors such as attention patterns or training data distribution. A breakdown by model scale and error type would clarify the link to the slot hypothesis.

minor comments (3)

[Figure 3] Figure 3 and associated text: The visualization of slot orthogonality would benefit from reporting the full distribution of cosine similarities across layers and positions rather than selected examples, to allow readers to assess robustness.
[Methods] Methods section: Data exclusion criteria and the exact number of examples per condition are not fully specified; including these details (or a link to the dataset) would improve reproducibility.
Notation: The terms 'current-entity slot' and 'prior-entity slot' are used interchangeably with 'directions' in some places; consistent terminology would reduce ambiguity when discussing functional roles versus representational geometry.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback. The comments identify valuable opportunities to strengthen the evidence for our claims about the functional roles of the entity slots. We address each major comment below and indicate the revisions we will make.

read point-by-point responses

Referee: §5 (functional roles experiments): The claim that 'only the current-entity slot is used for explicit factual retrieval questions' despite linear decodability from the prior slot rests on higher probe accuracy for the current direction and near-chance model performance on double-binding syntax. This is correlational; without an intervention that selectively perturbs or ablates the prior-entity direction (while preserving the current direction) and demonstrates no change in factual retrieval accuracy, the functional non-use conclusion remains unestablished.

Authors: We agree that the evidence presented is correlational and that selective interventions would provide stronger causal support for the conclusion that the prior-entity slot is not functionally used for explicit factual retrieval. Our current argument combines higher probe accuracy on the current direction with near-chance behavioral performance on double-binding syntax. In the revised manuscript we will add an explicit limitations subsection in the discussion that acknowledges this gap and outlines feasible future interventions (e.g., activation steering or direction-specific ablation). We will also report more granular per-direction probe accuracies to better quantify the observed disparity. revision: partial
Referee: §3 (multi-slot probing method): The orthogonality and separation of current- and prior-entity directions are demonstrated via linear probes, but the manuscript does not report controls for whether these directions generalize beyond the specific entity-attribute datasets used or whether they capture functional routing rather than dataset-specific correlations. Additional cross-dataset probe transfer results or synthetic controls would be needed to support the 'slots' interpretation.

Authors: We appreciate the call for stronger controls on generalization. In the revision we will add cross-dataset probe transfer results, including experiments on a new synthetic dataset with procedurally generated entities and attributes. These results will be reported alongside the original findings to demonstrate that the orthogonal directions are not artifacts of the particular entity-attribute corpus and instead reflect a more general routing mechanism. revision: yes
Referee: §4.3 (double-binding syntax tests): The near-chance performance on constructions forcing two SVO bindings on one token is presented as consistent with the slot limitation, but the paper does not quantify how much of the failure is attributable to the prior slot being inaccessible versus other factors such as attention patterns or training data distribution. A breakdown by model scale and error type would clarify the link to the slot hypothesis.

Authors: We concur that a finer-grained error analysis would help isolate the contribution of the slot limitation. The revised §4.3 will include a breakdown of accuracy by model scale and by error category (e.g., failure to bind the second subject versus attribute misassignment). We will also add a qualitative comparison of attention patterns across successful and failing cases to assess whether attention dynamics provide an independent explanation, while noting that the consistent pattern across scales remains most parsimoniously explained by the two-slot capacity. revision: yes

Circularity Check

0 steps flagged

No significant circularity in empirical probing and behavioral analysis

full rationale

The paper's claims rest on direct multi-slot linear probing of residual stream activations and accuracy measurements on held-out behavioral tasks (factual retrieval, relational inference, double-binding syntax). These are experimental observations of decodability and performance differentials, not derivations that reduce by construction to fitted parameters renamed as predictions, self-definitional equations, or load-bearing self-citations. No mathematical chain equates outputs to inputs; the gap between linear decodability and functional use is evidenced by task-specific results rather than assumed.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work relies on standard assumptions of linear probing in mechanistic interpretability (that directions in activation space correspond to readable features) and the validity of the chosen synthetic datasets for isolating entity bindings. No new free parameters, axioms, or invented entities are introduced beyond the probing technique itself.

axioms (1)

domain assumption Linear directions in residual stream activations can be isolated via probing to recover distinct entity representations.
Invoked when claiming that current and prior information are encoded in separate orthogonal slots.

pith-pipeline@v0.9.0 · 5577 in / 1409 out tokens · 16389 ms · 2026-05-09T23:44:34.132545+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

18 extracted references · 13 canonical work pages · 6 internal anchors

[1]

Understanding intermediate layers using linear classifier probes

Guillaume Alain and Yoshua Bengio. Understanding intermediate layers using linear classifier probes. InICLR Workshop, 2017. arXiv:1610.01644

work page Pith review arXiv 2017
[2]

Burke, Tristan Hume, Shan Carter, Tom Henighan, and Christopher Olah

Trenton Bricken, Adly Templeton, Joshua Batson, Brian Chen, Adam Jermyn, Tom Con- erly, Nick Turner, Cem Anil, Carson Denison, Amanda Askell, Robert Lasenby, Yifan Wu, Shauna Kravec, Nicholas Schiefer, Tim Maxwell, Nicholas Joseph, Zac Hatfield-Dodds, Alex Tamkin, Karina Nguyen, Brayden McLean, Josiah E. Burke, Tristan Hume, Shan Carter, Tom Henighan, and...

2023
[3]

Sparse Autoencoders Find Highly Interpretable Features in Language Models

Hoagy Cunningham, Aidan Ewart, Logan Riggs, Robert Huben, and Lee Sharkey. Sparse autoen- coders find highly interpretable features in language models.arXiv preprint arXiv:2309.08600, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[4]

Representational analysis of binding in language models.arXiv preprint arXiv:2409.05448, 2024

Qin Dai, Benjamin Heinzerling, and Kentaro Inui. Representational analysis of binding in language models.arXiv preprint arXiv:2409.05448, 2024

work page arXiv 2024
[5]

Measuring the persuasiveness of language models.https://www.anthropic.com/research/ measuring-model-persuasiveness, 2024

Esin Durmus, Liane Lovitt, Alex Tamkin, Stuart Ritchie, Jack Clark, and Deep Ganguli. Measuring the persuasiveness of language models.https://www.anthropic.com/research/ measuring-model-persuasiveness, 2024. Anthropic

2024
[6]

How do language models bind entities in context? In International Conference on Learning Representations, 2024

Jiahai Feng and Jacob Steinhardt. How do language models bind entities in context? In International Conference on Learning Representations, 2024. arXiv:2310.17191

work page arXiv 2024
[7]

arXiv preprint arXiv:2510.06182 , year=

Yoav Gur-Arieh, Mor Geva, and Atticus Geiger. Mixing mechanisms: How language models retrieve bound entities in-context.arXiv preprint arXiv:2510.06182, 2025. 19

work page arXiv 2025
[8]

Representational similarity analysis – connecting the branches of systems neuroscience.Frontiers in Systems Neuroscience, 2:4, 2008

Nikolaus Kriegeskorte, Marieke Mur, and Peter Bandettini. Representational similarity analysis – connecting the branches of systems neuroscience.Frontiers in Systems Neuroscience, 2:4, 2008

2008
[9]

arXiv preprint arXiv:2202.05262 , year=

Kevin Meng, David Bau, Alex Andonian, and Yonatan Belinkov. Locating and editing factual associations in GPT. InAdvances in Neural Information Processing Systems, 2022. arXiv:2202.05262

work page arXiv 2022
[10]

In-context Learning and Induction Heads

Catherine Olsson, Nelson Elhage, Neel Nanda, Nicholas Joseph, Nova DasSarma, Tom Henighan, Ben Mann, Amanda Askell, Yuntao Bai, Anna Chen, Tom Conerly, Dawn Drain, Deep Ganguli, Zac Hatfield-Dodds, Danny Hernandez, Scott Johnston, Andy Jones, Jackson Kernion, Liane Lovitt, Kamal Ndousse, Dario Amodei, Tom Brown, Jack Clark, Jared Kaplan, Sam McCandlish, a...

work page internal anchor Pith review arXiv 2022
[11]

and Goldstein, Simon and O'Gara, Aidan and Chen, Michael and Hendrycks, Dan , year =

PeterS.Park, Simon Goldstein, AidanO’Gara, MichaelChen, and DanHendrycks. AIdeception: A survey of examples, risks, and potential solutions.Patterns, 5(5), 2024. arXiv:2308.14752

work page arXiv 2024
[12]

Fine-tuning enhances existing mechanisms: A case study on entity tracking

Nikhil Prakash, Tamar Rott Shaham, Tal Haklay, Yonatan Belinkov, and David Bau. Fine- tuning enhances existing mechanisms: A case study on entity tracking. InInternational Conference on Learning Representations, 2024. arXiv:2402.14811

work page arXiv 2024
[13]

Steering Llama 2 via Contrastive Activation Addition

Nina Rimsky, Nick Gabrieli, Julian Schulz, Meg Tong, Evan Hubinger, and Alexander Matt Turner. Steering Llama 2 via contrastive activation addition. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics, 2024. arXiv:2312.06681

work page internal anchor Pith review arXiv 2024
[14]

Towards Understanding Sycophancy in Language Models

Mrinank Sharma, Meg Tong, Tomasz Korbak, David Duvenaud, Amanda Askell, Samuel R. Bowman, Newton Cheng, Esin Durmus, Zac Hatfield-Dodds, Scott R. Johnston, Shauna Kravec, Timothy Maxwell, Sam McCandlish, Kamal Ndousse, Oliver Rausch, Nicholas Schiefer, Da Yan, Miranda Zhang, and Ethan Perez. Towards understanding sycophancy in language models. In Internat...

work page internal anchor Pith review arXiv 2024
[15]

Tensor product variable binding and the representation of symbolic structures in connectionist systems.Artificial Intelligence, 46(1–2):159–216, 1990

Paul Smolensky. Tensor product variable binding and the representation of symbolic structures in connectionist systems.Artificial Intelligence, 46(1–2):159–216, 1990

1990
[16]

Treisman and Garry Gelade

Anne M. Treisman and Garry Gelade. A feature-integration theory of attention.Cognitive Psychology, 12(1):97–136, 1980

1980
[17]

Steering Language Models With Activation Engineering

Alexander Matt Turner, Lisa Thiergart, David Udell, Gavin Leech, Ulisse Mini, and Monte MacDiarmid. Activation addition: Steering language models without optimization.arXiv preprint arXiv:2308.10248, 2023

work page internal anchor Pith review arXiv 2023
[18]

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V. Le, and Denny Zhou. Chain-of-thought prompting elicits reasoning in large language models. InAdvances in Neural Information Processing Systems, 2022. arXiv:2201.11903. 20 Appendix A Alternative probing approach to modeling different entities’ representations ...

work page internal anchor Pith review arXiv 2022