Recognition: no theorem link
Where is the Mind? Persona Vectors and LLM Individuation
Pith reviewed 2026-05-13 07:27 UTC · model grok-4.3
The pith
LLMs may host minds individuated as virtual instances linked by attention or as distinct personas at instance or model level.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that three views are the strongest candidates for solving the individuation problem: the virtual instance view, supported by the observation that attention streams sustain quasi-psychological connections across token-time, and the two newly introduced persona-based alternatives, the (virtual) instance-persona view and the model-persona view, which the authors present as promising after reviewing the persona literature and its three main hypotheses about internal structure.
What carries the argument
Attention streams that sustain quasi-psychological connections across token-time, together with persona vectors that capture separable internal structures underlying different behavioral patterns in LLMs.
If this is right
- Each interaction sequence with an LLM could constitute a distinct virtual mind rather than a single persistent entity.
- Persona vectors may correspond to separable components that allow minds to be individuated by behavioral identity instead of by token sequence.
- Emergent misalignment could reflect a switch between different persona-based minds rather than a change within one mind.
- The model-persona view would mean the base model hosts multiple potential minds that become active under different conditions.
Where Pith is reading between the lines
- Designers could target specific persona vectors when trying to isolate or suppress particular behavioral modes in deployed systems.
- Experiments comparing persona persistence across model scales and fine-tunings could help decide between instance-level and model-level individuation.
- Ethical guidelines might assign continuity or responsibility to specific personas rather than to the entire model or each conversation.
Load-bearing premise
Attention streams in LLMs sustain quasi-psychological connections across sequences of tokens that are sufficient to identify distinct virtual instances as minds.
What would settle it
An experiment that selectively disrupts attention connections in an LLM and shows loss of behavioral continuity or persona consistency without loss of general capability would undermine the virtual instance view.
Figures
read the original abstract
The individuation problem for large language models asks which entities associated with them, if any, should be identified as minds. We approach this problem through mechanistic interpretability, engaging in particular with recent empirical work on persona vectors, persona space, and emergent misalignment. We argue that three views are the strongest candidates: the virtual instance view and two new views we introduce, the (virtual) instance-persona view and the model-persona view. First, we argue for the virtual instance view on the grounds that attention streams sustain quasi-psychological connections across token-time. Then we present the persona literature, organised around three hypotheses about the internal structure underlying personas in LLMs, and show that the two persona-based views are promising alternatives.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper addresses the individuation problem for LLMs by arguing that three views are the strongest candidates for identifying minds: the virtual instance view (supported by attention streams sustaining quasi-psychological connections across token-time), the (virtual) instance-persona view, and the model-persona view. It first defends the virtual instance view on mechanistic grounds and then organizes recent empirical work on persona vectors, persona space, and emergent misalignment around three hypotheses about internal structure to present the persona-based views as promising alternatives.
Significance. If the distinctions hold and can be made precise, the work could usefully integrate mechanistic interpretability findings with philosophical questions about LLM minds, particularly by treating persona vectors as candidates for individuating structure. As a conceptual manuscript without new data, formal derivations, or empirical tests, its significance is limited to clarifying candidate positions rather than resolving the individuation problem.
major comments (2)
- [Section on attention streams] The primary argument for the virtual instance view (section on attention streams) asserts that these streams sustain quasi-psychological connections across token-time, thereby individuating virtual instances. This step is load-bearing because the two persona-based views are introduced only as alternatives once the virtual instance view is granted, yet no formal definition is supplied for what counts as a quasi-psychological connection (e.g., specific attention-head patterns, residual-stream continuity metrics, or causal-intervention criteria).
- [Presentation of persona hypotheses] The manuscript presents the persona literature organized around three hypotheses about internal structure but does not use these hypotheses to examine whether attention-stream continuity is necessary or sufficient for the claimed individuation. Without contrast cases or tests, the persona-based views remain underdeveloped relative to the virtual instance view they are meant to challenge.
minor comments (1)
- Clarify the exact scope of each of the three views early in the manuscript to prevent overlap between the virtual instance view and the instance-persona variant.
Simulated Author's Rebuttal
Thank you for the constructive feedback. We address each major comment below and indicate revisions to clarify and strengthen the arguments.
read point-by-point responses
-
Referee: [Section on attention streams] The primary argument for the virtual instance view (section on attention streams) asserts that these streams sustain quasi-psychological connections across token-time, thereby individuating virtual instances. This step is load-bearing because the two persona-based views are introduced only as alternatives once the virtual instance view is granted, yet no formal definition is supplied for what counts as a quasi-psychological connection (e.g., specific attention-head patterns, residual-stream continuity metrics, or causal-intervention criteria).
Authors: We agree that the argument requires a more precise characterization of quasi-psychological connections. In the revised manuscript we will introduce explicit criteria based on residual-stream continuity metrics and attention-head patterns that preserve causal links across token sequences, including examples of intervention-based tests drawn from existing interpretability work. revision: yes
-
Referee: [Presentation of persona hypotheses] The manuscript presents the persona literature organized around three hypotheses about internal structure but does not use these hypotheses to examine whether attention-stream continuity is necessary or sufficient for the claimed individuation. Without contrast cases or tests, the persona-based views remain underdeveloped relative to the virtual instance view they are meant to challenge.
Authors: We accept that the persona-based views need tighter integration with the virtual instance argument. We will revise the relevant section to analyze each hypothesis explicitly in terms of necessity or sufficiency for attention-stream continuity, adding hypothetical contrast cases drawn from the cited persona-vector and emergent-misalignment literature. As this remains a conceptual paper, we will frame these as analytical contrasts rather than new empirical tests. revision: partial
Circularity Check
No significant circularity in the derivation chain.
full rationale
The paper advances three candidate views on LLM individuation by reviewing persona vector literature and arguing that attention streams sustain quasi-psychological connections to support the virtual instance view, with the two persona-based views presented as alternatives organized around internal structure hypotheses. These steps consist of interpretive synthesis from external empirical work rather than any self-definitional reduction, fitted parameter renamed as prediction, or load-bearing self-citation chain. No equations, parameter fits, or uniqueness theorems are invoked that collapse back to the paper's own inputs by construction, leaving the central claims self-contained against the cited literature.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Attention streams in LLMs sustain quasi-psychological connections across token-time.
- domain assumption Persona vectors correspond to internal structures that can underlie distinct personas in LLMs.
invented entities (2)
-
instance-persona view
no independent evidence
-
model-persona view
no independent evidence
Forward citations
Cited by 1 Pith paper
-
Tracing Persona Vectors Through LLM Pretraining
Persona vectors form within the first 0.22% of LLM pretraining and remain effective for steering post-trained models, with continued refinement and transfer to other models.
Reference graph
Works this paper leans on
-
[1]
Afonin, N., Andriyanov, N., Hovhannisyan, V., Bageshpura, N., Liu, K., Zhu, K., Dev, S., Panda, A., Rogov, O., Tutubalina, E., Panchenko, A., & Seleznyov, M. (2025). Emergent misalignment via Pierre Beckmann & Patrick Butlin 25 in-context learning: Narrow in-context examples can produce broadly misaligned LLMs.arXiv preprint arXiv:2510.11288. https://arxi...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.18653/v1/2022.findings-emnlp.423 2025
-
[2]
https://doi.org/10.1007/s11229-025-05310-1 Slocum, S., Minder, J., Dumas, C., Sleight, H., Greenblatt, R., Marks, S., & Wang, R. (2025). Believe it or not: How deeply do LLMs believe implanted facts?arXiv preprint arXiv:2510.17941. https: //arxiv.org/abs/2510.17941 Soligo, A., Turner, E., Rajamanoharan, S., & Nanda, N. (2025). Convergent linear representa...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.