arxiv: 2604.18158 · v1 · submitted 2026-04-20 · 💻 cs.AI

Recognition: unknown

State Transfer Reveals Reuse in Controlled Routing

Yanzhen Lu , Zhicheng Qian , Muchen Jiang , Xingyu Zhou

Authors on Pith no claims yet

Pith reviewed 2026-05-10 04:22 UTC · model grok-4.3

classification 💻 cs.AI

keywords state transfercontrolled routingmodel reuseprompt interventionsfixed interfacesGPT-2Qweninterpretability

0 comments

The pith

Fixed-interface state transfer provides stronger evidence of reuse in language model routing than prompt success alone.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines how language models store and reuse specific behaviors during controlled routing tasks. It identifies candidate interfaces on support data, then evaluates them on held-out queries using matched controls that test necessity, sufficiency, and wrong-interface baselines. Results show that transferring state at one fixed position recovers donor accuracy without retraining, while trainable prompts can achieve the same behavior at multiple other positions only after extra examples and optimization. This distinction matters because it locates where a behavior is actually represented inside the model rather than confirming only that the model can be made to produce the output.

Core claim

In controlled routing tasks, fixed-interface transfer is stronger evidence of reuse than trained prompt success alone. On GPT-2 triop, an early interface supports exact transfer under these tests. On GPT-2 add/sub, zero-retrain compiled transfer at the fixed interface recovers most of donor routing accuracy, while trainable prompt slots can relearn the same behavior at several other positions only after additional support examples and optimization. Qwen routing provides a cross-architecture consistency check for the same matched-interface pattern at the operator token, although donor-specific identity on the local V-path remains unresolved. Generation and reasoning branches map scope andshow

What carries the argument

Fixed-interface transfer, which moves a state representation identified on support data to a recipient model and validates it with necessity, sufficiency, and wrong-interface controls on held-out queries.

If this is right

Exact transfer occurs at an early fixed interface in GPT-2 triop routing without further training.
Zero-retrain transfer at the fixed interface recovers most donor accuracy in GPT-2 add/sub tasks.
Trainable prompt slots require additional support examples and can relearn the behavior at multiple other positions.
Qwen models exhibit the same matched-interface pattern at the operator token across architectures.
Generation and reasoning branches show broader transport or weaker controller identifiability once control depends on longer trajectories.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same transfer protocol could be used to test whether other model behaviors are stored at fixed positions rather than being prompt-relocatable.
If fixed interfaces prove reliable, targeted state editing might become possible without retraining entire prompts.
The approach separates representation location from behavioral capability, which may help interpretability work distinguish storage from computation.

Load-bearing premise

The interfaces identified on support data, evaluated on held-out queries, and validated with matched necessity, sufficiency, and wrong-interface controls accurately isolate the behaviorally relevant state representation.

What would settle it

If wrong-interface controls recover accuracy as well as the chosen interface on held-out data, or if fixed-interface transfer fails to recover accuracy while relocated trainable prompts succeed after few examples.

Figures

Figures reproduced from arXiv: 2604.18158 by Muchen Jiang, Xingyu Zhou, Yanzhen Lu, Zhicheng Qian.

**Figure 2.** Figure 2: Triop localization: early layers dominate (left), and control position is privileged (right). [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Copy2 generation interface comparison. Transport improves as the write/read scope becomes broader. [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: CopyN generation scaling for interface interventions [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Fixed-corrupt protocol: trajectory-level state transport remains effective; tok1-only does not. [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: Qwen local compiled family-width census under matched support-selected search. Canonical [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 7.** Figure 7: Within the retained controller comparison, Qwen solve is proposal-rich but commit-poor. Small oracle [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗

**Figure 8.** Figure 8: GPT-2 add/sub same-budget learned baseline suite. All learned rows use the same held-out split and an [PITH_FULL_IMAGE:figures/full_fig_p014_8.png] view at source ↗

**Figure 9.** Figure 9: Compiled transport versus learned relocation on GPT-2 add/sub. The dashed line marks the compiled [PITH_FULL_IMAGE:figures/full_fig_p015_9.png] view at source ↗

read the original abstract

Prompt-based interventions can change model behavior, but trained success alone does not identify where the behaviorally relevant state is represented. We study this question in controlled routing tasks using interfaces chosen on support data, held-out query evaluation, and matched necessity, sufficiency, and wrong-interface controls. On GPT-2 triop, an early interface supports exact transfer under these tests. On GPT-2 add/sub, zero-retrain compiled transfer at the fixed interface recovers most of donor routing accuracy, while trainable prompt slots can relearn the same behavior at several other positions only after additional support examples and optimization. These results distinguish fixed-interface reuse from prompt relocation in a setting where the two can be tested directly. Qwen routing provides a cross-architecture consistency check for the same matched-interface pattern at the operator token, although donor-specific identity on the local V-path remains unresolved. Generation and reasoning branches are used to map scope: they show broader transport or weaker controller identifiability once control depends on longer trajectories or harder selection. In controlled routing, fixed-interface transfer is therefore stronger evidence of reuse than trained prompt success alone.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Fixed-interface state transfer distinguishes reuse from prompt relocation better than training alone in controlled routing.

read the letter

The key point is that fixed-interface state transfer works as stronger evidence of reuse than just training prompts successfully. In the controlled routing tasks, they select interfaces on support data, evaluate on held-out queries, and apply matched necessity, sufficiency, and wrong-interface controls. For GPT-2 triop, an early interface allows exact transfer. For add/sub, zero-retrain transfer at the fixed spot gets most of the donor accuracy back, whereas trainable prompts can pick up the behavior at other positions but only after more examples and optimization. Qwen routing checks the same pattern at the operator token. What stands out is the experimental design that directly tests the difference between reuse at a fixed interface and just relocating the prompt. The controls seem well-matched to isolate the behaviorally relevant state. Using generation and reasoning branches to check scope is a reasonable way to see where the method holds or weakens when trajectories get longer. The soft spots are mostly around the lack of quantitative details in what we have. Without numbers on accuracy recovery, error rates, or how the wrong-interface controls performed, it's difficult to gauge how convincing the results are. The unresolved part about donor-specific identity on the local V-path in Qwen also leaves some questions open. The scope is limited to these routing tasks, so broader claims about LLMs would need more work. This is aimed at researchers in mechanistic interpretability who focus on locating computations through interventions. Someone thinking about activation editing or circuit discovery would find the controls useful to think about. It should go to peer review because the core idea and setup address a real gap in how we interpret prompt interventions, and the logic holds up even if the full results need scrutiny.

Referee Report

2 major / 3 minor

Summary. The manuscript claims that prompt success alone does not locate behaviorally relevant state in language models, and that fixed-interface transfer under matched controls provides stronger evidence of reuse. In controlled routing tasks, interfaces are selected on support data, evaluated on held-out queries, and validated with necessity, sufficiency, and wrong-interface controls. On GPT-2 triop an early interface yields exact transfer; on GPT-2 add/sub zero-retrain compiled transfer at the fixed interface recovers most donor accuracy while trainable prompt slots relearn the behavior at other positions only after extra support and optimization. Qwen routing supplies a cross-architecture check at the operator token. Generation and reasoning branches map scope, showing broader transport or weaker identifiability on longer trajectories. The central conclusion is that fixed-interface transfer distinguishes reuse from prompt relocation more convincingly than trained-prompt success.

Significance. If the experimental controls and quantitative outcomes hold, the work supplies a concrete, falsifiable protocol for distinguishing internal state reuse from surface prompt relocation in routing and control settings. The matched necessity/sufficiency/wrong-interface design and the zero-retrain versus trainable comparison are methodologically useful for interpretability research. Cross-architecture consistency on Qwen and the scope-mapping via generation/reasoning branches add breadth. The approach could inform future mechanistic studies of how models implement conditional routing.

major comments (2)

[Results (GPT-2 add/sub)] Results section on GPT-2 add/sub: the claim that zero-retrain compiled transfer recovers 'most' donor routing accuracy is load-bearing for the central distinction between fixed-interface reuse and prompt relocation, yet the abstract and visible description supply no numerical values, standard errors, or direct comparison to the trainable-prompt baselines; without these the strength of the evidence cannot be assessed.
[Experimental setup and controls] Methods / experimental setup: the weakest assumption—that the chosen interfaces plus necessity/sufficiency/wrong-interface controls accurately isolate the behaviorally relevant state—is central to interpreting transfer as reuse rather than artifact. The paper must supply explicit quantitative criteria, statistical thresholds, or ablation tables showing how each control was scored and passed.

minor comments (3)

[Introduction] Define 'triop' and 'add/sub' routing tasks at first use with a short example or pseudocode; readers outside the immediate subfield cannot reconstruct the task from the abstract alone.
[Qwen routing results] The Qwen section states that 'donor-specific identity on the local V-path remains unresolved.' Clarify whether this is a limitation of the current controls or an open question for future work, and state its impact on the cross-architecture claim.
[Scope mapping] Generation and reasoning branches are introduced to 'map scope.' Provide a brief table or figure caption summarizing the key differences in transport strength or controller identifiability between the two branches.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive review and for recommending minor revision. The comments identify opportunities to make the quantitative evidence and control criteria more explicit, which we address below.

read point-by-point responses

Referee: [Results (GPT-2 add/sub)] Results section on GPT-2 add/sub: the claim that zero-retrain compiled transfer recovers 'most' donor routing accuracy is load-bearing for the central distinction between fixed-interface reuse and prompt relocation, yet the abstract and visible description supply no numerical values, standard errors, or direct comparison to the trainable-prompt baselines; without these the strength of the evidence cannot be assessed.

Authors: We agree that the abstract and high-level results summary would benefit from explicit numbers. The full experimental results (Section 4.2) contain the per-condition accuracies, standard errors across seeds, and direct comparisons showing that zero-retrain transfer at the fixed interface recovers the large majority of donor performance while trainable slots at other positions recover substantially less under matched support data. We will revise the abstract and the opening of the results section to report these values and the baseline comparison. revision: yes
Referee: [Experimental setup and controls] Methods / experimental setup: the weakest assumption—that the chosen interfaces plus necessity/sufficiency/wrong-interface controls accurately isolate the behaviorally relevant state—is central to interpreting transfer as reuse rather than artifact. The paper must supply explicit quantitative criteria, statistical thresholds, or ablation tables showing how each control was scored and passed.

Authors: We accept that clearer documentation of the control thresholds is warranted. The Methods section defines the three controls, but we will add an explicit statement of the quantitative criteria (e.g., necessity: ablation drops accuracy to within a small margin of chance; sufficiency: interface alone reaches a high fraction of donor accuracy; wrong-interface: no above-chance transfer) together with a summary table or supplementary ablation results that tabulate pass/fail outcomes and the statistical tests applied for each experiment. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper reports empirical experiments on state transfer in controlled routing tasks using GPT-2 and Qwen models. Interface selection occurs on support data, with evaluation on held-out queries and matched necessity/sufficiency/wrong-interface controls. No equations, derivations, fitted parameters presented as predictions, self-citations, or ansatzes appear in the provided text. Central claims rest on observable transfer outcomes distinguishing fixed-interface reuse from prompt relocation, without reduction to inputs by construction or self-referential definitions.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The abstract describes an empirical experimental protocol without mathematical derivations, free parameters, or new postulated entities; it relies on standard assumptions from machine learning and interpretability research.

pith-pipeline@v0.9.0 · 5491 in / 957 out tokens · 46455 ms · 2026-05-10T04:22:57.300093+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

12 extracted references · 2 canonical work pages · 1 internal anchor

[1]

2017 , eprint =

Attention Is All You Need , author =. 2017 , eprint =

2017
[2]

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing , year =

The Power of Scale for Parameter-Efficient Prompt Tuning , author =. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing , year =

2021
[3]

Prefix-Tuning: Optimizing Continuous Prompts for Generation , author =. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) , year =
[4]

2021 , eprint =

LoRA: Low-Rank Adaptation of Large Language Models , author =. 2021 , eprint =

2021
[5]

2022 , eprint =

Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets , author =. 2022 , eprint =

2022
[6]

2019 , howpublished =

Language Models are Unsupervised Multitask Learners , author =. 2019 , howpublished =

2019
[7]

Locating and Editing Factual Associations in GPT, January 2023

Meng, Kevin and Bau, David and Andonian, Alex and Belinkov, Yonatan , year =. Locating and Editing Factual Associations in. 2202.05262 , archivePrefix =

work page arXiv
[8]

2023 , eprint =

GPT Understands, Too , author =. 2023 , eprint =

2023
[9]

2022 , eprint =

P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks , author =. 2022 , eprint =

2022
[10]

2023 , eprint =

Activation Addition: Steering Language Models Without Optimization , author =. 2023 , eprint =

2023
[11]

Representation Engineering: A Top-Down Approach to AI Transparency

Zou, Andy and Phan, Long and Chen, Sarah and Campbell, James and Guo, Phillip and Ren, Richard and Pan, Alexander and Yin, Xuwang and Mazeika, Mantas and Dombrowski, Ann-Kathrin and Goel, Shashwat and Li, Nathaniel and Byun, Michael J. and Wang, Zifan and Mallen, Alex and Basart, Steven and Koyejo, Sanmi and Song, Dawn and Fredrikson, Matt and Kolter, J. ...

work page internal anchor Pith review arXiv
[12]

2024 , eprint =

Controllable Context Sensitivity and the Knob Behind It , author =. 2024 , eprint =

2024