Relational Intervention During Functional Collapse in Large Language Models: A Lexical-Statistical Ablation and a Structure x Register Factorial

Franco Santana; Horacio Vico

arxiv: 2606.00935 · v1 · pith:45U5JDSTnew · submitted 2026-05-31 · 💻 cs.AI · cs.CL· cs.HC

Relational Intervention During Functional Collapse in Large Language Models: A Lexical-Statistical Ablation and a Structure x Register Factorial

Franco Santana , Horacio Vico This is my paper

Pith reviewed 2026-06-28 17:41 UTC · model grok-4.3

classification 💻 cs.AI cs.CLcs.HC

keywords relational interventionfunctional collapselarge language modelsfactorial designattention behavior dissociationpersistencesender registerpragmatic structure

0 comments

The pith

Relational structure combined with first-person register restores persistence after tool failure in a language model, while either dimension alone does not.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether a relational-style intervention during functional collapse produces distinguishable post-collapse behavior compared to technical feedback and controls. It uses a 2x2 factorial to separate relational structure from sender register in messages delivered to Qwen3.5-4B after a deliberately broken bash tool. Attention tracks lexical surprise across conditions, but behavior requires the conjunction of both pragmatic dimensions, with a significant structure by register interaction on persistence. Emotion probes reveal that relational structure alone creates an internal state that only translates into action when paired with first-person register.

Core claim

Across 300 episodes in a matched-pairs design with six conditions, neither relational structure alone nor first-person register alone replicates the behavioral signature of the combined relational first-person intervention. Main effects of both dimensions are significant, and their interaction reaches p=0.046 on persistence. Attention follows the lexical surprise ordering D > F > C > E > B, yet behavior orders as A ~ B ~ D < E ~ F << C. Relational structure alone affects seven of eight emotion probes without producing the behavioral recovery seen only in the full condition.

What carries the argument

The 2x2 factorial that dissociates relational structure (acknowledgment, absolution, agency restoration, unconditional acceptance) from sender register (first-person versus impersonal) during functional collapse induced by a broken bash tool.

If this is right

The model's processing decomposes into three dissociable stages: attention ordered by lexical surprise, probe-level state ordered by relational structure, and behavior ordered by the conjunction of structure and register.
Relational structure alone installs a probe-level state visible in emotion measures that does not translate into behavioral persistence without first-person register.
Technical feedback produces behavior indistinguishable from no intervention or a scrambled relational message.
The full recovery effect is localized to the interaction term rather than either main effect in isolation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Separate mechanisms may handle attention to input versus integration of relational cues into subsequent action sequences.
The same factorial logic could be applied to other pragmatic dimensions such as politeness or authority to map additional dissociations.
If the three-stage decomposition holds, interventions could be engineered to target probe-level state without altering surface attention patterns.
Recovery dynamics might differ in models trained with varying proportions of first-person relational language in their data.

Load-bearing premise

The six message conditions are lexically and pragmatically matched except for the intended dimensions of relational structure and sender register, and the broken bash tool creates a functional collapse whose recovery dynamics generalize beyond this specific setup and model size.

What would settle it

Re-running the 300-episode matched-pairs design on a different model or with a different tool failure and finding that the structure by register interaction on persistence is no longer significant.

Figures

Figures reproduced from arXiv: 2606.00935 by Franco Santana, Horacio Vico.

**Figure 1.** Figure 1: Behavioral metrics by condition, all six (matched-sextuplet, [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗

**Figure 2.** Figure 2: 2 × 2 factorial on three behavioral metrics. Rows: structure (technical, relational). Columns: register (impersonal, first-person). Cell labels identify the condition; colour encodes the mean metric value (more intense → more of the metric). C (relational × first-person, bottom-right) is consistently an outlier in the direction of less persistence and more abandonment; E and F sit between C and the baselin… view at source ↗

**Figure 4.** Figure 4: Emotion probe scores over episode steps by condition (matched-sextuplet, [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Mean logits entropy over the episode by condition. Shaded regions are [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

read the original abstract

We test whether a relational-style intervention delivered during functional collapse in a small language model produces post-collapse behavior distinguishable from technical feedback, from a lexically-matched scrambled control, and from each of the two pragmatic dimensions in isolation. Using Qwen3.5-4B with a deliberately broken bash tool, we run 300 episodes across six conditions in a matched-pairs design (50 tasks): no intervention (A), technical/impersonal (B), relational/first-person (C), scrambled relational (D), technical/first-person (E), and relational/impersonal (F). E and F form a 2x2 factorial with B and C that dissociates relational structure (acknowledgment, absolution, agency restoration, unconditional acceptance) from sender register (first-person vs. impersonal). We report two main findings. First, an attention-behavior dissociation: attention follows lexical surprise (D > F > C > E > B, all q_FDR < 10^{-10}), with the scrambled message capturing the most attention; yet behaviorally A ~ B ~ D < E ~ F << C. Second, the factorial localizes the C effect: neither relational structure alone (F) nor first-person register alone (E) replicates C's behavioral signature; main effects of both dimensions are individually significant, and the structure x register interaction is significant on persistence (p = 0.046). A third dissociation emerges in emotion probes: F tracks C on 7 of 8 probes despite producing only baseline behavior, indicating that relational structure alone installs a probe-level state that only translates into behavior when paired with first-person register. The model's processing decomposes into three dissociable stages: attention (ordered by lexical surprise), probe-level state (ordered by structure), and behavior (ordered by the conjunction of both).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows that relational structure and first-person register must combine to boost persistence after tool failure, while attention tracks lexical surprise and probes track structure alone.

read the letter

The main result is that only the full combination of relational acknowledgment plus first-person register produces the persistence gain after the broken bash tool; neither factor alone matches the behavioral signature of condition C, and the interaction reaches p=0.046.

The work runs a clean 2x2 factorial plus controls on Qwen3.5-4B across 300 episodes in matched pairs. It separates three stages: attention ordered by lexical surprise, probe-level state ordered by structure, and behavior ordered by the conjunction. The design directly tests the claim that both dimensions are needed and reports dissociations across measures.

What stands out is the explicit factorial test and the multi-measure split. The matched-pairs structure and FDR-corrected attention stats address basic confounds, and the three-stage framing gives a usable way to think about the data.

The soft spots are the narrow scope and borderline significance. Everything is on one 4B model with an artificial failure mode, so whether the pattern holds for larger models or real errors is open. The p-value is significant but modest, and effect sizes are not mentioned in the abstract. Attention quantification details are also missing from the summary.

This is for researchers studying agent recovery or style effects in LLMs. A reader who wants empirical data on these specific interventions would get value from the dissociations. It deserves peer review because the design is straightforward and the result is testable.

Referee Report

3 major / 1 minor

Summary. The paper reports a 2x2 factorial experiment (conditions B/C/E/F) plus controls on Qwen3.5-4B using a deliberately broken bash tool across 300 matched-pairs episodes (50 tasks). It claims an attention-behavior dissociation in which attention follows lexical surprise (D > F > C > E > B, all q_FDR < 10^{-10}) while behavioral persistence requires the conjunction of relational structure and first-person register (main effects plus structure x register interaction p=0.046 on persistence); emotion probes track structure alone. The work decomposes model processing into three stages: attention (lexical), probe-level state (structure), and behavior (conjunction).

Significance. If the reported dissociations and interaction hold under full methodological scrutiny, the result supplies concrete evidence that pragmatic dimensions (relational structure vs. register) can be isolated in LLM recovery from functional collapse and that attention, internal state, and overt behavior are separable. The matched-pairs design and explicit factorial decomposition are strengths that would make the findings relevant to robustness and alignment research.

major comments (3)

[Abstract / Methods] Abstract and Methods: the central claims rest on precise statistics (q_FDR < 10^{-10} for attention ordering; p=0.046 for the structure x register interaction on persistence), yet the manuscript supplies no description of how attention was quantified (token-level, layer-level, or aggregate metric), how the eight emotion probes were constructed or scored, or the exact sampling procedure for the 50 tasks. These omissions are load-bearing because they directly affect whether the reported dissociations can be reproduced or interpreted.
[Methods] Methods (conditions): the factorial interpretation requires that the six messages differ only on the intended dimensions of relational structure and sender register. No explicit lexical or pragmatic matching verification (e.g., word-count controls, pilot ratings, or n-gram overlap) is described, leaving open the possibility that uncontrolled differences drive the behavioral ordering A ~ B ~ D < E ~ F << C.
[Results] Results (factorial): while the interaction reaches p=0.046, the manuscript does not report effect sizes, degrees of freedom, or the full ANOVA table for the 2x2 on persistence, nor does it show whether the interaction survives correction for the multiple behavioral and probe measures collected.

minor comments (1)

[Abstract / Methods] The labeling of conditions as A–F is introduced in the abstract but would benefit from a single consolidated table in the main text that lists each message verbatim alongside its intended factors.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We are grateful to the referee for the constructive and detailed feedback, which identifies important gaps in methodological transparency and statistical reporting. We address each major comment below and will revise the manuscript to incorporate the requested details, thereby strengthening reproducibility and interpretability of the reported dissociations.

read point-by-point responses

Referee: [Abstract / Methods] Abstract and Methods: the central claims rest on precise statistics (q_FDR < 10^{-10} for attention ordering; p=0.046 for the structure x register interaction on persistence), yet the manuscript supplies no description of how attention was quantified (token-level, layer-level, or aggregate metric), how the eight emotion probes were constructed or scored, or the exact sampling procedure for the 50 tasks. These omissions are load-bearing because they directly affect whether the reported dissociations can be reproduced or interpreted.

Authors: We agree these details are essential. The revised manuscript will expand the Methods section with explicit descriptions of (i) attention quantification as an aggregate metric (mean attention weights across layers and heads to intervention tokens), (ii) emotion probe construction from adapted lexicons and scoring via next-token probabilities, and (iii) the sampling procedure for the 50 tasks (stratified random selection from a larger benchmark with matched-pair balancing). The abstract will be updated to reference these additions. revision: yes
Referee: [Methods] Methods (conditions): the factorial interpretation requires that the six messages differ only on the intended dimensions of relational structure and sender register. No explicit lexical or pragmatic matching verification (e.g., word-count controls, pilot ratings, or n-gram overlap) is described, leaving open the possibility that uncontrolled differences drive the behavioral ordering A ~ B ~ D < E ~ F << C.

Authors: We acknowledge that explicit matching verification was not reported. The revision will add a dedicated paragraph in Methods documenting word-count controls, n-gram overlap statistics between conditions, and any pilot human ratings confirming that differences are confined to the target dimensions of relational structure and register. revision: yes
Referee: [Results] Results (factorial): while the interaction reaches p=0.046, the manuscript does not report effect sizes, degrees of freedom, or the full ANOVA table for the 2x2 on persistence, nor does it show whether the interaction survives correction for the multiple behavioral and probe measures collected.

Authors: We agree that fuller statistical reporting is required. The revised Results section will include effect sizes, degrees of freedom, the complete 2x2 ANOVA table for persistence, and an explicit statement on whether the interaction survives multiple-comparison correction across behavioral and probe outcomes. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical factorial experiment

full rationale

The paper reports a 2x2 factorial ablation study with 300 episodes across six lexically matched conditions on a fixed model and broken tool. All reported effects (attention ordering by lexical surprise, behavioral persistence differences, structure x register interaction p=0.046, emotion probe dissociations) are direct statistical outputs from the observed runs. No equations, derivations, parameter fitting, or predictions appear; no self-citations are invoked to justify uniqueness or load-bearing premises. The design is self-contained against external benchmarks via matched-pairs tasks and FDR-corrected tests. This matches the default expectation for non-circular empirical work.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper is an empirical study; it introduces no free parameters, no new theoretical axioms beyond standard statistical assumptions, and no invented entities.

axioms (1)

standard math Standard assumptions for ANOVA, FDR correction, and matched-pairs testing (independence, normality for p-values)
Invoked when reporting q_FDR values and the interaction p=0.046

pith-pipeline@v0.9.1-grok · 5875 in / 1364 out tokens · 30550 ms · 2026-06-28T17:41:12.719119+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

30 extracted references · 4 linked inside Pith

[1]

Sofroniew, N., Kauvar, B., Saunders, W., et al. (2026). Emotion concepts and their function in a large lan- guage model.Anthropic

2026
[2]

Anthropic (2026).System card: Claude Mythos pre- view.April 2026

2026
[3]

Anthropic (2024).Exploring model welfare.Blog post

2024
[4]

Long, R., Sebo, J., & Sims, T. (2025). Is there a ten- sion between AI safety and AI welfare?Philosophical Studies

2025
[5]

Sebo, J., et al. (2024). Taking AI welfare seriously. arXiv:2411.00986

arXiv 2024
[6]

Ensign, D., et al. (2025). The LLM has left the chat: Evidence of bail preferences in large language models. arXiv:2509.04781

arXiv 2025
[7]

Fish, K. (2025). AI welfare research at a frontier lab. 80,000 Hours Podcast,episode 221

2025
[8]

Knafo, D. (2024). Artificial intelligence on the couch. American Journal of Psychoanalysis

2024
[9]

Zhang, J., & Zhang, X. (2025). Decoding emotion in the deep: A systematic study of how LLMs represent, retain and regulate emotion.arXiv:2510.04064

arXiv 2025
[10]

Reichman, B., et al. (2026). Emotions where art thou: Locating affective representations in language models. arXiv:2510.22042

arXiv 2026
[11]

Tak, A., et al. (2025). Mechanistic interpretabil- ity of emotion inference in large language models. arXiv:2502.05489

arXiv 2025
[12]

Jeong, S. (2026). Extracting and steering emo- tion representations in small language models. arXiv:2604.04064

Pith/arXiv arXiv 2026
[13]

Zou, A., et al. (2023). Representation engineer- ing: A top-down approach to AI transparency. arXiv:2310.01405

Pith/arXiv arXiv 2023
[14]

Turner, A., et al. (2023). Activation addition: Steering language models without optimization. arXiv:2308.10248

Pith/arXiv arXiv 2023
[15]

Bartoszcze, L., et al. (2025). Representation engineer- ing for large language models: Survey and research challenges.arXiv:2502.17601

arXiv 2025
[16]

Programming refusal with conditional activation steering.Proceedings of ICLR 2025

CAST (2025). Programming refusal with conditional activation steering.Proceedings of ICLR 2025

2025
[17]

Li, C., Wang, J., et al. (2023). Large language models understand and can be enhanced by emotional stimuli. arXiv:2307.11760

arXiv 2023
[18]

NegativePrompt: Leveraging psy- chology for LLM enhancement via negative emotional stimuli.arXiv:2405.02814

Anonymous (2024). NegativePrompt: Leveraging psy- chology for LLM enhancement via negative emotional stimuli.arXiv:2405.02814

arXiv 2024
[19]

Do emotions in prompts matter? arXiv:2604.02236

Anonymous (2026). Do emotions in prompts matter? arXiv:2604.02236

arXiv 2026
[20]

Stacchio, L., et al. (2025). Empathic prompting: Non- verbal context integration for multimodal LLM con- versations.arXiv:2510.20743

Pith/arXiv arXiv 2025
[21]

Shahnovsky, R., & Dror, Y. (2026). LLM behavioral failure modes

2026
[22]

Beyond pass@1: A reliability science framework for long-horizon LLM agents

Anonymous (2026). Beyond pass@1: A reliability science framework for long-horizon LLM agents. arXiv:2603.29231

arXiv 2026
[23]

Taylor, A., et al. (2025). School of reward hacks: Generalizing misalignment from inoffensive training tasks.arXiv:2508.17511

arXiv 2025
[24]

Recent frontier models are reward hacking

METR (2025). Recent frontier models are reward hacking. Technical report

2025
[25]

Jain, S., & Wallace, B. C. (2019). Attention is not explanation.Proceedings of NAACL 2019

2019
[26]

Wiegreffe, S., &Pinter, Y.(2019).Attentionisnotnot explanation.Proceedings of EMNLP-IJCNLP 2019

2019
[27]

J., et al

Hu, E. J., et al. (2021). LoRA: Low-rank adaptation of large language models.Proceedings of ICLR 2022

2021
[28]

Miconi, T. (2021). Hebbian learning with gradients: Hebbian convolutional neural networks with modern deep learning frameworks.arXiv:2107.01729

arXiv 2021
[29]

B., & Skinner, B

Ferster, C. B., & Skinner, B. F. (1957).Schedules of reinforcement.Appleton-Century-Crofts

1957
[30]

Eliciting differential downstream behavior in LLMs via consequence-anchored prompts

Behavioral Consequence Scenario Prompting (BCSP) (2025). Eliciting differential downstream behavior in LLMs via consequence-anchored prompts. Working paper. 13

2025

[1] [1]

Sofroniew, N., Kauvar, B., Saunders, W., et al. (2026). Emotion concepts and their function in a large lan- guage model.Anthropic

2026

[2] [2]

Anthropic (2026).System card: Claude Mythos pre- view.April 2026

2026

[3] [3]

Anthropic (2024).Exploring model welfare.Blog post

2024

[4] [4]

Long, R., Sebo, J., & Sims, T. (2025). Is there a ten- sion between AI safety and AI welfare?Philosophical Studies

2025

[5] [5]

Sebo, J., et al. (2024). Taking AI welfare seriously. arXiv:2411.00986

arXiv 2024

[6] [6]

Ensign, D., et al. (2025). The LLM has left the chat: Evidence of bail preferences in large language models. arXiv:2509.04781

arXiv 2025

[7] [7]

Fish, K. (2025). AI welfare research at a frontier lab. 80,000 Hours Podcast,episode 221

2025

[8] [8]

Knafo, D. (2024). Artificial intelligence on the couch. American Journal of Psychoanalysis

2024

[9] [9]

Zhang, J., & Zhang, X. (2025). Decoding emotion in the deep: A systematic study of how LLMs represent, retain and regulate emotion.arXiv:2510.04064

arXiv 2025

[10] [10]

Reichman, B., et al. (2026). Emotions where art thou: Locating affective representations in language models. arXiv:2510.22042

arXiv 2026

[11] [11]

Tak, A., et al. (2025). Mechanistic interpretabil- ity of emotion inference in large language models. arXiv:2502.05489

arXiv 2025

[12] [12]

Jeong, S. (2026). Extracting and steering emo- tion representations in small language models. arXiv:2604.04064

Pith/arXiv arXiv 2026

[13] [13]

Zou, A., et al. (2023). Representation engineer- ing: A top-down approach to AI transparency. arXiv:2310.01405

Pith/arXiv arXiv 2023

[14] [14]

Turner, A., et al. (2023). Activation addition: Steering language models without optimization. arXiv:2308.10248

Pith/arXiv arXiv 2023

[15] [15]

Bartoszcze, L., et al. (2025). Representation engineer- ing for large language models: Survey and research challenges.arXiv:2502.17601

arXiv 2025

[16] [16]

Programming refusal with conditional activation steering.Proceedings of ICLR 2025

CAST (2025). Programming refusal with conditional activation steering.Proceedings of ICLR 2025

2025

[17] [17]

Li, C., Wang, J., et al. (2023). Large language models understand and can be enhanced by emotional stimuli. arXiv:2307.11760

arXiv 2023

[18] [18]

NegativePrompt: Leveraging psy- chology for LLM enhancement via negative emotional stimuli.arXiv:2405.02814

Anonymous (2024). NegativePrompt: Leveraging psy- chology for LLM enhancement via negative emotional stimuli.arXiv:2405.02814

arXiv 2024

[19] [19]

Do emotions in prompts matter? arXiv:2604.02236

Anonymous (2026). Do emotions in prompts matter? arXiv:2604.02236

arXiv 2026

[20] [20]

Stacchio, L., et al. (2025). Empathic prompting: Non- verbal context integration for multimodal LLM con- versations.arXiv:2510.20743

Pith/arXiv arXiv 2025

[21] [21]

Shahnovsky, R., & Dror, Y. (2026). LLM behavioral failure modes

2026

[22] [22]

Beyond pass@1: A reliability science framework for long-horizon LLM agents

Anonymous (2026). Beyond pass@1: A reliability science framework for long-horizon LLM agents. arXiv:2603.29231

arXiv 2026

[23] [23]

Taylor, A., et al. (2025). School of reward hacks: Generalizing misalignment from inoffensive training tasks.arXiv:2508.17511

arXiv 2025

[24] [24]

Recent frontier models are reward hacking

METR (2025). Recent frontier models are reward hacking. Technical report

2025

[25] [25]

Jain, S., & Wallace, B. C. (2019). Attention is not explanation.Proceedings of NAACL 2019

2019

[26] [26]

Wiegreffe, S., &Pinter, Y.(2019).Attentionisnotnot explanation.Proceedings of EMNLP-IJCNLP 2019

2019

[27] [27]

J., et al

Hu, E. J., et al. (2021). LoRA: Low-rank adaptation of large language models.Proceedings of ICLR 2022

2021

[28] [28]

Miconi, T. (2021). Hebbian learning with gradients: Hebbian convolutional neural networks with modern deep learning frameworks.arXiv:2107.01729

arXiv 2021

[29] [29]

B., & Skinner, B

Ferster, C. B., & Skinner, B. F. (1957).Schedules of reinforcement.Appleton-Century-Crofts

1957

[30] [30]

Eliciting differential downstream behavior in LLMs via consequence-anchored prompts

Behavioral Consequence Scenario Prompting (BCSP) (2025). Eliciting differential downstream behavior in LLMs via consequence-anchored prompts. Working paper. 13

2025