pith. machine review for the scientific record. sign in

arxiv: 2604.18158 · v1 · submitted 2026-04-20 · 💻 cs.AI

Recognition: unknown

State Transfer Reveals Reuse in Controlled Routing

Authors on Pith no claims yet

Pith reviewed 2026-05-10 04:22 UTC · model grok-4.3

classification 💻 cs.AI
keywords state transfercontrolled routingmodel reuseprompt interventionsfixed interfacesGPT-2Qweninterpretability
0
0 comments X

The pith

Fixed-interface state transfer provides stronger evidence of reuse in language model routing than prompt success alone.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines how language models store and reuse specific behaviors during controlled routing tasks. It identifies candidate interfaces on support data, then evaluates them on held-out queries using matched controls that test necessity, sufficiency, and wrong-interface baselines. Results show that transferring state at one fixed position recovers donor accuracy without retraining, while trainable prompts can achieve the same behavior at multiple other positions only after extra examples and optimization. This distinction matters because it locates where a behavior is actually represented inside the model rather than confirming only that the model can be made to produce the output.

Core claim

In controlled routing tasks, fixed-interface transfer is stronger evidence of reuse than trained prompt success alone. On GPT-2 triop, an early interface supports exact transfer under these tests. On GPT-2 add/sub, zero-retrain compiled transfer at the fixed interface recovers most of donor routing accuracy, while trainable prompt slots can relearn the same behavior at several other positions only after additional support examples and optimization. Qwen routing provides a cross-architecture consistency check for the same matched-interface pattern at the operator token, although donor-specific identity on the local V-path remains unresolved. Generation and reasoning branches map scope andshow

What carries the argument

Fixed-interface transfer, which moves a state representation identified on support data to a recipient model and validates it with necessity, sufficiency, and wrong-interface controls on held-out queries.

If this is right

  • Exact transfer occurs at an early fixed interface in GPT-2 triop routing without further training.
  • Zero-retrain transfer at the fixed interface recovers most donor accuracy in GPT-2 add/sub tasks.
  • Trainable prompt slots require additional support examples and can relearn the behavior at multiple other positions.
  • Qwen models exhibit the same matched-interface pattern at the operator token across architectures.
  • Generation and reasoning branches show broader transport or weaker controller identifiability once control depends on longer trajectories.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same transfer protocol could be used to test whether other model behaviors are stored at fixed positions rather than being prompt-relocatable.
  • If fixed interfaces prove reliable, targeted state editing might become possible without retraining entire prompts.
  • The approach separates representation location from behavioral capability, which may help interpretability work distinguish storage from computation.

Load-bearing premise

The interfaces identified on support data, evaluated on held-out queries, and validated with matched necessity, sufficiency, and wrong-interface controls accurately isolate the behaviorally relevant state representation.

What would settle it

If wrong-interface controls recover accuracy as well as the chosen interface on held-out data, or if fixed-interface transfer fails to recover accuracy while relocated trainable prompts succeed after few examples.

Figures

Figures reproduced from arXiv: 2604.18158 by Muchen Jiang, Xingyu Zhou, Yanzhen Lu, Zhicheng Qian.

Figure 1
Figure 1. Figure 1: GPT-2 triop: the left panel shows exact transfer at the matched interface; the right panel shows that the [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Triop localization: early layers dominate (left), and control position is privileged (right). [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Copy2 generation interface comparison. Transport improves as the write/read scope becomes broader. [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: CopyN generation scaling for interface interventions [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Fixed-corrupt protocol: trajectory-level state transport remains effective; tok1-only does not. [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Qwen local compiled family-width census under matched support-selected search. Canonical [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Within the retained controller comparison, Qwen solve is proposal-rich but commit-poor. Small oracle [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: GPT-2 add/sub same-budget learned baseline suite. All learned rows use the same held-out split and an [PITH_FULL_IMAGE:figures/full_fig_p014_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Compiled transport versus learned relocation on GPT-2 add/sub. The dashed line marks the compiled [PITH_FULL_IMAGE:figures/full_fig_p015_9.png] view at source ↗
read the original abstract

Prompt-based interventions can change model behavior, but trained success alone does not identify where the behaviorally relevant state is represented. We study this question in controlled routing tasks using interfaces chosen on support data, held-out query evaluation, and matched necessity, sufficiency, and wrong-interface controls. On GPT-2 triop, an early interface supports exact transfer under these tests. On GPT-2 add/sub, zero-retrain compiled transfer at the fixed interface recovers most of donor routing accuracy, while trainable prompt slots can relearn the same behavior at several other positions only after additional support examples and optimization. These results distinguish fixed-interface reuse from prompt relocation in a setting where the two can be tested directly. Qwen routing provides a cross-architecture consistency check for the same matched-interface pattern at the operator token, although donor-specific identity on the local V-path remains unresolved. Generation and reasoning branches are used to map scope: they show broader transport or weaker controller identifiability once control depends on longer trajectories or harder selection. In controlled routing, fixed-interface transfer is therefore stronger evidence of reuse than trained prompt success alone.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The manuscript claims that prompt success alone does not locate behaviorally relevant state in language models, and that fixed-interface transfer under matched controls provides stronger evidence of reuse. In controlled routing tasks, interfaces are selected on support data, evaluated on held-out queries, and validated with necessity, sufficiency, and wrong-interface controls. On GPT-2 triop an early interface yields exact transfer; on GPT-2 add/sub zero-retrain compiled transfer at the fixed interface recovers most donor accuracy while trainable prompt slots relearn the behavior at other positions only after extra support and optimization. Qwen routing supplies a cross-architecture check at the operator token. Generation and reasoning branches map scope, showing broader transport or weaker identifiability on longer trajectories. The central conclusion is that fixed-interface transfer distinguishes reuse from prompt relocation more convincingly than trained-prompt success.

Significance. If the experimental controls and quantitative outcomes hold, the work supplies a concrete, falsifiable protocol for distinguishing internal state reuse from surface prompt relocation in routing and control settings. The matched necessity/sufficiency/wrong-interface design and the zero-retrain versus trainable comparison are methodologically useful for interpretability research. Cross-architecture consistency on Qwen and the scope-mapping via generation/reasoning branches add breadth. The approach could inform future mechanistic studies of how models implement conditional routing.

major comments (2)
  1. [Results (GPT-2 add/sub)] Results section on GPT-2 add/sub: the claim that zero-retrain compiled transfer recovers 'most' donor routing accuracy is load-bearing for the central distinction between fixed-interface reuse and prompt relocation, yet the abstract and visible description supply no numerical values, standard errors, or direct comparison to the trainable-prompt baselines; without these the strength of the evidence cannot be assessed.
  2. [Experimental setup and controls] Methods / experimental setup: the weakest assumption—that the chosen interfaces plus necessity/sufficiency/wrong-interface controls accurately isolate the behaviorally relevant state—is central to interpreting transfer as reuse rather than artifact. The paper must supply explicit quantitative criteria, statistical thresholds, or ablation tables showing how each control was scored and passed.
minor comments (3)
  1. [Introduction] Define 'triop' and 'add/sub' routing tasks at first use with a short example or pseudocode; readers outside the immediate subfield cannot reconstruct the task from the abstract alone.
  2. [Qwen routing results] The Qwen section states that 'donor-specific identity on the local V-path remains unresolved.' Clarify whether this is a limitation of the current controls or an open question for future work, and state its impact on the cross-architecture claim.
  3. [Scope mapping] Generation and reasoning branches are introduced to 'map scope.' Provide a brief table or figure caption summarizing the key differences in transport strength or controller identifiability between the two branches.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive review and for recommending minor revision. The comments identify opportunities to make the quantitative evidence and control criteria more explicit, which we address below.

read point-by-point responses
  1. Referee: [Results (GPT-2 add/sub)] Results section on GPT-2 add/sub: the claim that zero-retrain compiled transfer recovers 'most' donor routing accuracy is load-bearing for the central distinction between fixed-interface reuse and prompt relocation, yet the abstract and visible description supply no numerical values, standard errors, or direct comparison to the trainable-prompt baselines; without these the strength of the evidence cannot be assessed.

    Authors: We agree that the abstract and high-level results summary would benefit from explicit numbers. The full experimental results (Section 4.2) contain the per-condition accuracies, standard errors across seeds, and direct comparisons showing that zero-retrain transfer at the fixed interface recovers the large majority of donor performance while trainable slots at other positions recover substantially less under matched support data. We will revise the abstract and the opening of the results section to report these values and the baseline comparison. revision: yes

  2. Referee: [Experimental setup and controls] Methods / experimental setup: the weakest assumption—that the chosen interfaces plus necessity/sufficiency/wrong-interface controls accurately isolate the behaviorally relevant state—is central to interpreting transfer as reuse rather than artifact. The paper must supply explicit quantitative criteria, statistical thresholds, or ablation tables showing how each control was scored and passed.

    Authors: We accept that clearer documentation of the control thresholds is warranted. The Methods section defines the three controls, but we will add an explicit statement of the quantitative criteria (e.g., necessity: ablation drops accuracy to within a small margin of chance; sufficiency: interface alone reaches a high fraction of donor accuracy; wrong-interface: no above-chance transfer) together with a summary table or supplementary ablation results that tabulate pass/fail outcomes and the statistical tests applied for each experiment. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper reports empirical experiments on state transfer in controlled routing tasks using GPT-2 and Qwen models. Interface selection occurs on support data, with evaluation on held-out queries and matched necessity/sufficiency/wrong-interface controls. No equations, derivations, fitted parameters presented as predictions, self-citations, or ansatzes appear in the provided text. Central claims rest on observable transfer outcomes distinguishing fixed-interface reuse from prompt relocation, without reduction to inputs by construction or self-referential definitions.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The abstract describes an empirical experimental protocol without mathematical derivations, free parameters, or new postulated entities; it relies on standard assumptions from machine learning and interpretability research.

pith-pipeline@v0.9.0 · 5491 in / 957 out tokens · 46455 ms · 2026-05-10T04:22:57.300093+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

12 extracted references · 2 canonical work pages · 1 internal anchor

  1. [1]

    2017 , eprint =

    Attention Is All You Need , author =. 2017 , eprint =

  2. [2]

    Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing , year =

    The Power of Scale for Parameter-Efficient Prompt Tuning , author =. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing , year =

  3. [3]

    Prefix-Tuning: Optimizing Continuous Prompts for Generation , author =. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) , year =

  4. [4]

    2021 , eprint =

    LoRA: Low-Rank Adaptation of Large Language Models , author =. 2021 , eprint =

  5. [5]

    2022 , eprint =

    Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets , author =. 2022 , eprint =

  6. [6]

    2019 , howpublished =

    Language Models are Unsupervised Multitask Learners , author =. 2019 , howpublished =

  7. [7]

    Locating and Editing Factual Associations in GPT, January 2023

    Meng, Kevin and Bau, David and Andonian, Alex and Belinkov, Yonatan , year =. Locating and Editing Factual Associations in. 2202.05262 , archivePrefix =

  8. [8]

    2023 , eprint =

    GPT Understands, Too , author =. 2023 , eprint =

  9. [9]

    2022 , eprint =

    P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks , author =. 2022 , eprint =

  10. [10]

    2023 , eprint =

    Activation Addition: Steering Language Models Without Optimization , author =. 2023 , eprint =

  11. [11]

    Representation Engineering: A Top-Down Approach to AI Transparency

    Zou, Andy and Phan, Long and Chen, Sarah and Campbell, James and Guo, Phillip and Ren, Richard and Pan, Alexander and Yin, Xuwang and Mazeika, Mantas and Dombrowski, Ann-Kathrin and Goel, Shashwat and Li, Nathaniel and Byun, Michael J. and Wang, Zifan and Mallen, Alex and Basart, Steven and Koyejo, Sanmi and Song, Dawn and Fredrikson, Matt and Kolter, J. ...

  12. [12]

    2024 , eprint =

    Controllable Context Sensitivity and the Knob Behind It , author =. 2024 , eprint =