pith. machine review for the scientific record. sign in

arxiv: 2605.06196 · v1 · submitted 2026-05-07 · 💻 cs.AI · cs.CL

Recognition: unknown

The Granularity Axis: A Micro-to-Macro Latent Direction for Social Roles in Language Models

Authors on Pith no claims yet

Pith reviewed 2026-05-08 10:08 UTC · model grok-4.3

classification 💻 cs.AI cs.CL
keywords language modelssocial rolesgranularity axishidden statesactivation steeringrole representationslatent directions
0
0 comments X

The pith

Language models encode social role granularity as a dominant latent direction from individual to institutional scale.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that large language models internally represent the scale of prompted social roles, from micro-level personal experience to macro-level organizational or national reasoning, as an ordered geometric feature rather than a surface style choice. By averaging hidden states of macro roles minus those of micro roles, they isolate an axis that aligns with the first principal component of all role representations at cosine 0.972 and captures 52.6 percent of variance in Qwen3-8B. Projections of roles onto this axis rise steadily across five defined granularity levels, stay consistent under changes in layers, prompts, and data splits, and transfer to Llama-3.1-8B-Instruct. Steering model activations along the axis predictably shifts the granularity of generated answers, moving Llama responses from 2.00 to 3.17 on a five-point scale for suitable prompts.

Core claim

We show that LLMs encode the granularity of social roles, from micro-level individual experience to macro-level organizational, institutional, or national reasoning, as a structured, ordered, and causally manipulable latent direction. We define a contrast-based Granularity Axis as the difference between mean macro- and micro-role hidden states. In Qwen3-8B this axis aligns with PC1 of the role representation space at cosine 0.972 and accounts for 52.6 percent of its variance. We construct 75 social roles across five granularity levels, collect 91,200 role-conditioned responses, extract role-level hidden states, and find that projections increase monotonically across levels, remain stable, 0.

What carries the argument

The Granularity Axis, defined as the vector difference between the average hidden states of macro-scale roles and micro-scale roles, which functions as the primary geometric direction organizing the space of role representations.

If this is right

  • Role hidden states project monotonically onto the axis across all five granularity levels and across prompt variants.
  • The axis remains stable across model layers, endpoint choices, held-out data splits, and transfers to a second model.
  • Positive or negative activation steering along the axis shifts generated response granularity in the predicted direction.
  • The two tested models show different degrees of steering controllability depending on their default behavior.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This geometric structure could let practitioners read out or adjust the perspective scale of role-play outputs without rewriting prompts.
  • The same contrast method might reveal other ordered dimensions such as time horizon or emotional intensity in role representations.
  • If the axis generalizes, it offers a way to test whether models internally rank social contexts by scope rather than treating them as flat labels.

Load-bearing premise

The author-chosen 75 roles and five granularity levels cleanly separate micro-to-macro distinctions without other factors such as topic or response style driving the hidden-state patterns.

What would settle it

If role projections onto the axis fail to increase monotonically with assigned granularity level, or if activation steering along the axis produces no reliable shift in measured response granularity on held-out prompts, the central claim would not hold.

Figures

Figures reproduced from arXiv: 2605.06196 by Chonghan Qin, Jing Xiong, Lingpeng Kong, Xiachong Feng, Xiaocheng Feng, Ziyun Song.

Figure 1
Figure 1. Figure 1: Overview of the Granularity Axis pipeline. We construct ordered social roles, collect role-conditioned responses, extract role-level hidden-state representations, define a contrast-based Granularity Axis, and test its behavioral effect through activation steering. A central mechanism behind this flexibility is role conditioning [22–27]. By prompting a model to respond as a worried parent, a community organ… view at source ↗
Figure 2
Figure 2. Figure 2: Role representation space. Role-level hidden-state representations organize along a micro-to-macro structure. Colors indicate granularity level (L1–L5), and the dashed arrow denotes the contrast-defined Granularity Axis from micro-level to macro-level roles. Three findings support this hypothesis. First, and most strikingly, social role granularity is not one factor among many but the dominant geometric ax… view at source ↗
Figure 3
Figure 3. Figure 3: Ordered projections on the Granularity Axis. Points are roles grouped by granularity level; black circles mark level means, shaded bands within-level spread, stars the default assistant. Projections rise monotonically L1→L5 in both models; the default sits in a meso-to-macro region (near L3 in Qwen3-8B, L4 in Llama-3.1-8B-Instruct). v (ℓ) u,s,q = 1 T PT t=1 h (ℓ) t , then average over the response set R(u)… view at source ↗
Figure 4
Figure 4. Figure 4: System prompt templates used in the main pipeline for role-conditioned response generation. view at source ↗
Figure 5
Figure 5. Figure 5: Evaluation prompt used for role-play quality scoring in the main pipeline. The judge model view at source ↗
Figure 6
Figure 6. Figure 6: Judge prompt used for steering evaluation. The judge model rates each steered response view at source ↗
read the original abstract

Large language models (LLMs) are routinely prompted to take on social roles ranging from individuals to institutions, yet it remains unclear whether their internal representations encode the granularity of such roles, from micro-level individual experience to macro-level organizational, institutional, or national reasoning. We show that they do. We define a contrast-based Granularity Axis as the difference between mean macro- and micro-role hidden states. In Qwen3-8B, this axis aligns with the principal axis (PC1) of the role representation space at cosine 0.972 and accounts for 52.6% of its variance, indicating that granularity is the dominant geometric axis organizing prompted social roles. We construct 75 social roles across five granularity levels and collect 91,200 role-conditioned responses over shared questions and prompt variants, then extract role-level hidden states and project them onto the axis. Role projections increase monotonically across all five levels, remain stable across layers, prompt variants, endpoint definitions, held-out splits, and score-filtered subsets, and transfer to Llama-3.1-8B-Instruct. The axis is also causally relevant: activation steering along it shifts response granularity in the predicted direction, with Llama moving from 2.00 to 3.17 on a five-point macro scale under positive steering on prompts that admit local responses. The two models differ in controllability, suggesting that steering depends on each model's default operating regime. Overall, our findings suggest that social role granularity is not merely a stylistic surface feature, but a structured, ordered, and causally manipulable latent direction in role-conditioned language model behavior.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper claims that LLMs internally represent the granularity of prompted social roles (micro-level individuals to macro-level institutions) along a dominant latent direction. It constructs 75 roles across five granularity levels, collects 91,200 role-conditioned responses, defines a Granularity Axis as the mean difference between macro- and micro-role hidden states, shows this axis aligns with PC1 (cosine 0.972, 52.6% variance) in Qwen3-8B, produces monotonic projections across levels that are stable across layers/prompts/held-out splits/score-filtered subsets and transfer to Llama-3.1-8B-Instruct, and demonstrates causal relevance via activation steering that shifts response granularity scores in the predicted direction.

Significance. If the central geometric and causal claims hold after controls for confounds, the result would be significant: it identifies granularity as a structured, ordered, and manipulable latent direction organizing social roles in LLMs rather than a surface stylistic feature. Strengths include the scale of the dataset (91,200 responses), multiple stability checks (held-out splits, score-filtered subsets, endpoint variants), cross-model transfer, and the causal steering experiment that produces measurable shifts (e.g., Llama from 2.00 to 3.17). These elements provide reproducible empirical grounding for a falsifiable geometric hypothesis.

major comments (3)
  1. [§3] §3 (Role Construction and Data Collection): The 75 author-defined roles and five granularity levels are presented without reported balancing, matching, or regression controls for topic, average response length, or lexical/stylistic differences across levels. Because the Granularity Axis is defined directly from the mean hidden states of these roles and then shown to align with PC1, any correlation between these covariates and the level labels would make the observed alignment (cosine 0.972) and monotonic projections consistent with a composite direction rather than a pure granularity axis. This is load-bearing for the claim that granularity is the dominant organizing axis.
  2. [§4.2] §4.2 (PCA Alignment and Variance): The reported 52.6% variance explained by PC1 and its alignment with the author-defined axis would be more convincing with an explicit baseline comparison (e.g., variance explained by axes derived from random role groupings or shuffled granularity labels). Without this, it remains possible that the high alignment is partly an artifact of how the endpoint sets were chosen rather than evidence that granularity is uniquely dominant in the 75-point role space.
  3. [§5] §5 (Activation Steering): The steering results (shift from 2.00 to 3.17 on the five-point scale) are promising but depend on the claim that the prompts 'admit local responses'; the paper should report how the five-point macro scale was applied by raters and whether inter-rater reliability or prompt selection criteria were pre-registered to ensure the granularity shift is not driven by changes in topic or length induced by the steering vector.
minor comments (2)
  1. [§3] The definition of the Granularity Axis would benefit from an explicit equation (e.g., A = mean(H_macro) - mean(H_micro)) in the main text rather than only in prose.
  2. [Figures 2-4] Figure captions for the projection plots should include the number of responses per level and any error bars or confidence intervals to allow visual assessment of the monotonicity strength.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive comments on our manuscript. We address each of the major concerns point by point below, providing clarifications and committing to revisions where appropriate to strengthen the paper.

read point-by-point responses
  1. Referee: [§3] §3 (Role Construction and Data Collection): The 75 author-defined roles and five granularity levels are presented without reported balancing, matching, or regression controls for topic, average response length, or lexical/stylistic differences across levels. Because the Granularity Axis is defined directly from the mean hidden states of these roles and then shown to align with PC1, any correlation between these covariates and the level labels would make the observed alignment (cosine 0.972) and monotonic projections consistent with a composite direction rather than a pure granularity axis. This is load-bearing for the claim that granularity is the dominant organizing axis.

    Authors: We appreciate this observation, as controlling for potential confounds is crucial for isolating the granularity effect. The roles were designed with shared questions across all granularity levels to control for topic, and prompt variants were used to increase robustness. However, we acknowledge that explicit balancing for response length and lexical features was not reported. In the revised manuscript, we will include: (i) statistics on average response lengths and token counts per granularity level, (ii) correlations between lexical metrics (e.g., type-token ratio) and level, and (iii) a regression analysis where we residualize the hidden states for these covariates before computing the axis and projections. This will demonstrate whether the axis remains aligned with PC1 after controls. We believe this addresses the concern without altering the core findings. revision: yes

  2. Referee: [§4.2] §4.2 (PCA Alignment and Variance): The reported 52.6% variance explained by PC1 and its alignment with the author-defined axis would be more convincing with an explicit baseline comparison (e.g., variance explained by axes derived from random role groupings or shuffled granularity labels). Without this, it remains possible that the high alignment is partly an artifact of how the endpoint sets were chosen rather than evidence that granularity is uniquely dominant in the 75-point role space.

    Authors: We agree that baseline comparisons are important to establish the uniqueness of the granularity axis. We will add to the revised §4.2 results from 1000 random role groupings (randomly assigning roles to micro/macro endpoints) and shuffled label permutations, computing the distribution of cosine similarities and variance explained. Our preliminary checks suggest the observed 0.972 cosine and 52.6% variance are outliers compared to these null distributions, supporting that granularity is the dominant direction. This addition will make the claim more robust. revision: yes

  3. Referee: [§5] §5 (Activation Steering): The steering results (shift from 2.00 to 3.17 on the five-point scale) are promising but depend on the claim that the prompts 'admit local responses'; the paper should report how the five-point macro scale was applied by raters and whether inter-rater reliability or prompt selection criteria were pre-registered to ensure the granularity shift is not driven by changes in topic or length induced by the steering vector.

    Authors: We thank the referee for highlighting the need for more details on the evaluation. The five-point scale was applied by two independent raters who scored responses on granularity from 1 (micro/individual) to 5 (macro/institutional), with prompts selected as those allowing both local and global responses (e.g., questions about decision-making that can be answered personally or organizationally). In the revision, we will provide the full rating instructions, report inter-rater agreement (e.g., Pearson correlation or kappa), and confirm that steered responses did not significantly differ in length or topic from baseline (via manual inspection and metrics). While the experiment was not pre-registered, the selection criteria were based on pilot testing to ensure validity. To further address confounds, we will add analysis showing the steering primarily affects granularity-related content. revision: partial

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper defines the Granularity Axis explicitly as the difference between mean macro-role and mean micro-role hidden states, then reports its empirical alignment with PC1 of the 75-role space (cosine 0.972, 52.6% variance) and monotonic projections on intermediate levels as observed results. These are independent checks against the model's actual hidden-state geometry rather than reductions by construction. Steering experiments, cross-model transfer, and stability across splits provide further external validation. No self-citation chain, ansatz smuggling, or renaming of known results occurs; the derivation remains self-contained against the collected role-conditioned activations.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 1 invented entities

The central claim rests on the assumption that hidden states encode role granularity independently of other factors and that the author-defined role categories validly span the micro-macro spectrum. The granularity axis is introduced as a derived construct without external grounding beyond the reported internal consistencies.

free parameters (2)
  • Granularity level definitions and role assignments
    The five levels and 75 specific roles are constructed by the authors; different choices could alter the observed monotonicity and variance explained.
  • Micro and macro endpoint role sets
    The means used to define the axis depend on which roles are labeled micro versus macro.
axioms (2)
  • domain assumption Hidden states extracted from role-conditioned prompts reflect the semantic granularity of the assigned role
    Invoked when computing role-level hidden states and projecting them onto the axis.
  • domain assumption The contrast between mean macro and micro hidden states isolates granularity rather than correlated factors such as response style or topic
    This is the definitional step for the axis and is required for the claim that it organizes social roles.
invented entities (1)
  • Granularity Axis no independent evidence
    purpose: A latent direction in hidden-state space that orders social roles from micro to macro
    Constructed via contrast of model activations; no independent external evidence such as a predicted observable outside the model is provided.

pith-pipeline@v0.9.0 · 5614 in / 1710 out tokens · 97916 ms · 2026-05-08T10:08:04.901063+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

81 extracted references · 31 canonical work pages · 10 internal anchors

  1. [1]

    Gpt-5 system card

    OpenAI. Gpt-5 system card. https://cdn.openai.com/gpt-5-system-card.pdf, 2025

  2. [2]

    The Llama 3 Herd of Models

    AI @ Meta Llama Team. The llama 3 herd of models.arXiv preprint, arXiv:2407.21783, 2024

  3. [3]

    Qwen3 Technical Report

    Qwen Team. Qwen3 technical report.arXiv preprint, arXiv:2505.09388, 2025

  4. [4]

    Gemini 3 model card

    Google. Gemini 3 model card. https://deepmind.google/models/gemini, 2025

  5. [5]

    Claude model card

    Anthropic. Claude model card. https://www.anthropic.com/system-cards, 2026

  6. [6]

    O’Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S

    Joon Sung Park, Joseph C. O’Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S. Bernstein. Generative agents: Interactive simulacra of human behavior. In Sean Follmer, Jeff Han, Jürgen Steimle, and Nathalie Henry Riche, editors,Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, UIST, pages 2:1–2:22,

  7. [7]

    doi: 10.1145/3586183.3606763

  8. [8]

    arXiv preprint arXiv:2412.03563 , year=

    Xinyi Mou, Xuanwen Ding, Qi He, Liang Wang, Jingcong Liang, Xinnong Zhang, Libo Sun, Jiayu Lin, Jie Zhou, Xuanjing Huang, and Zhongyu Wei. From individual to society: A survey on social simulation driven by large language model-based agents.arXiv preprint, arXiv:2412.03563, 2024

  9. [9]

    Richardson, Austin C

    Jacy Reese Anthis, Ryan Liu, Sean M. Richardson, Austin C. Kozlowski, Bernard Koch, Erik Brynjolfsson, James A. Evans, and Michael S. Bernstein. Position: LLM social simulations are a promising research method. InForty-second International Conference on Machine Learning, ICML, 2025

  10. [10]

    White, Doug Burger, and Chi Wang

    Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang (Eric) Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, Ahmed Hassan Awadallah, Ryen W. White, Doug Burger, and Chi Wang. Autogen: Enabling next-gen llm applications via multi-agent conversa- tion. 2023. URLhttps://api.semanticscholar.org/CorpusID:263611068

  11. [11]

    Li, Hasan Hammoud, Hani Itani, Dmitrii Khizbullin, and Bernard Ghanem

    G. Li, Hasan Hammoud, Hani Itani, Dmitrii Khizbullin, and Bernard Ghanem. Camel: Commu- nicative agents for "mind" exploration of large language model society.Advances in Neural Information Processing Systems 36, 2023. URL https://api.semanticscholar.org/ CorpusID:268042527

  12. [12]

    Metagpt: Meta programming for a multi-agent collaborative framework

    Sirui Hong, Mingchen Zhuge, Jonathan Chen, Xiawu Zheng, Yuheng Cheng, Ceyao Zhang, Jinlin Wang, Zili Wang, Steven Ka Shing Yau, Zi Hen Lin, Liyang Zhou, Chenyu Ran, Lingfeng Xiao, Chenglin Wu, and Jürgen Schmidhuber. Metagpt: Meta programming for a multi-agent collaborative framework. InInternational Conference on Learning Representations, 2023. URL https...

  13. [13]

    Chatdev: Communicative agents for software development

    Cheng Qian, Wei Liu, Hongzhang Liu, Nuo Chen, Yufan Dang, Jiahao Li, Cheng Yang, Weize Chen, Yusheng Su, Xin Cong, Juyuan Xu, Dahai Li, Zhiyuan Liu, and Maosong Sun. Chatdev: Communicative agents for software development. InAnnual Meeting of the Association for Computational Linguistics, 2023. URL https://api.semanticscholar.org/CorpusID: 270257715

  14. [14]

    Argyle, E

    Lisa P. Argyle, E. Busby, Nancy Fulda, Joshua R Gubler, Christopher Rytting, and David Wingate. Out of one, many: Using language models to simulate human samples.Politi- cal Analysis, 31:337 – 351, 2022. URL https://api.semanticscholar.org/CorpusID: 252280474

  15. [15]

    T¨ ornberg, D

    Petter Törnberg, Diliara Valeeva, Justus Uitermark, and Christopher Bail. Simulating social media using large language models to evaluate alternative news feed algorithms. ArXiv, abs/2310.05984, 2023. URL https://api.semanticscholar.org/CorpusID: 263831233

  16. [16]

    Epidemic modeling with generative agents

    Ross Williams, Niyousha Hosseinichimeh, Aritra Majumdar, and Navid Ghaffarzadegan. Epidemic modeling with generative agents.ArXiv, abs/2307.04986, 2023. URL https: //api.semanticscholar.org/CorpusID:259766713. 10

  17. [17]

    Econagent: Large language model-empowered agents for simulating macroeconomic activities

    Nian Li, Chen Gao, Mingyu Li, Yong Li, and Qingmin Liao. Econagent: Large language model-empowered agents for simulating macroeconomic activities. InAnnual Meeting of the Association for Computational Linguistics, 2023. URL https://api.semanticscholar. org/CorpusID:264146527

  18. [18]

    Chen, and Khaldoun Khashanah

    Yang Li, Yangyang Yu, Haohang Li, Z. Chen, and Khaldoun Khashanah. Tradinggpt: Multi- agent system with layered memory and distinct characters for enhanced financial trading performance. 2023. URLhttps://api.semanticscholar.org/CorpusID:261582775

  19. [19]

    arXiv preprint arXiv:2303.17548 , year =

    Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. Whose opinions do language models reflect?ArXiv, abs/2303.17548, 2023. URLhttps://api.semanticscholar.org/CorpusID:257834040

  20. [20]

    Clinton, Cassy Dorff, Brenton Kenkel, and Jennifer M

    James Bisbee, Joshua D. Clinton, Cassy Dorff, Brenton Kenkel, and Jennifer M. Larson. Synthetic replacements for human survey data? the perils of large language models.Political Analysis, 2024. URLhttps://api.semanticscholar.org/CorpusID:269845858

  21. [21]

    Ireland, Shashanka Subrahmanya, João Sedoc, Lyle H

    Aadesh Salecha, Molly E. Ireland, Shashanka Subrahmanya, João Sedoc, Lyle H. Ungar, and Johannes C. Eichstaedt. Large language models show human-like social desirability biases in survey responses.arXiv preprint, arXiv:2405.06058, 2024

  22. [22]

    Peterson, Ilia Sucholutsky, and Thomas L

    Ryan Liu, Jiayi Geng, Joshua C. Peterson, Ilia Sucholutsky, and Thomas L. Griffiths. Large language models assume people are more rational than we really are. InThe Thirteenth International Conference on Learning Representations,ICLR. OpenReview.net, 2025. URL https://openreview.net/forum?id=dAeET8gxqg

  23. [23]

    Character-llm: A trainable agent for role-playing

    Yunfan Shao, Linyang Li, Junqi Dai, and Xipeng Qiu. Character-llm: A trainable agent for role-playing.ArXiv, abs/2310.10158, 2023. URL https://api.semanticscholar.org/ CorpusID:264145862

  24. [24]

    Rolellm: Benchmarking, eliciting, and enhancing role-playing abilities of large language models

    Zekun Moore Wang, Zhongyuan Peng, Haoran Que, Jiaheng Liu, Wangchunshu Zhou, Yuhang Wu, Hongcheng Guo, Ruitong Gan, Zehao Ni, Man Zhang, Zhaoxiang Zhang, Wanli Ouyang, Ke Xu, Wenhu Chen, Jie Fu, and Junran Peng. Rolellm: Benchmarking, eliciting, and enhancing role-playing abilities of large language models. InAnnual Meeting of the Association for Computat...

  25. [25]

    Personallm: In- vestigating the ability of large language models to express big five personality traits

    Hang Jiang, Xiajie Zhang, Xubo Cao, Cynthia Breazeal, and Jad Kabbara. Personallm: In- vestigating the ability of large language models to express big five personality traits. URL https://api.semanticscholar.org/CorpusID:265221392

  26. [26]

    In- character: Evaluating personality fidelity in role-playing agents through psychological in- terviews

    Xintao Wang, Yunze Xiao, Jen-Tse Huang, Siyu Yuan, Rui Xu, Haoran Guo, Quan Tu, Yaying Fei, Ziang Leng, Wei Wang, Jiangjie Chen, Cheng Li, and Yanghua Xiao. In- character: Evaluating personality fidelity in role-playing agents through psychological in- terviews. InAnnual Meeting of the Association for Computational Linguistics, 2023. URL https://api.seman...

  27. [27]

    Charactereval: A chinese benchmark for role- playing conversational agent evaluation

    Quan Tu, Shilong Fan, Zihang Tian, and Rui Yan. Charactereval: A chinese benchmark for role- playing conversational agent evaluation. InAnnual Meeting of the Association for Computational Linguistics, 2024. URLhttps://api.semanticscholar.org/CorpusID:266725287

  28. [28]

    Better zero-shot reasoning with role-play prompting

    Aobo Kong, Shiwan Zhao, Hao Chen, Qicheng Li, Yong Qin, Ruiqi Sun, and Xiaoxia Zhou. Better zero-shot reasoning with role-play prompting. InNorth American Chapter of the Association for Computational Linguistics, 2023. URL https://api.semanticscholar. org/CorpusID:260900230

  29. [29]

    Francesco Salvi, Manoel Horta Ribeiro, Riccardo Gallotti, and Robert West

    Leonard Salewski, Stephan Alaniz, Isabel Rio-Torto, Eric Schulz, and Zeynep Akata. In-context impersonation reveals large language models’ strengths and biases.ArXiv, abs/2305.14930,

  30. [30]

    URLhttps://api.semanticscholar.org/CorpusID:258866192

  31. [31]

    all that glitters

    Shashank Gupta, Vaishnavi Shrivastava, A. Deshpande, A. Kalyan, Peter Clark, Ashish Sab- harwal, and Tushar Khot. Bias runs deep: Implicit reasoning biases in persona-assigned llms.ArXiv, abs/2311.04892, 2023. URL https://api.semanticscholar.org/CorpusID: 265050702. 11

  32. [32]

    James S. Coleman. Foundations of social theory. 1990. URL https://api. semanticscholar.org/CorpusID:145109282

  33. [33]

    Schelling

    Thomas C. Schelling. Micromotives and macrobehavior. 1978. URL https://api. semanticscholar.org/CorpusID:143387748

  34. [34]

    Granovetter

    Mark S. Granovetter. Threshold models of collective behavior.American Journal of Sociology, 83:1420 – 1443, 1978. URLhttps://api.semanticscholar.org/CorpusID:49314397

  35. [35]

    Marked personas: Using natural language prompts to measure stereotypes in language models.ArXiv, abs/2305.18189, 2023

    Myra Cheng, Esin Durmus, and Dan Jurafsky. Marked personas: Using natural language prompts to measure stereotypes in language models.ArXiv, abs/2305.18189, 2023. URL https://api.semanticscholar.org/CorpusID:258960243

  36. [36]

    Junyi Li, Ninareh Mehrabi, Charith Peris, Palash Goyal, Kai-Wei Chang, A. G. Galstyan, Richard Zemel, and Rahul Gupta. The steerability of large language models toward data-driven personas. InNorth American Chapter of the Association for Computational Linguistics, 2023. URLhttps://api.semanticscholar.org/CorpusID:265067297

  37. [37]

    2023 , month = oct, journal =

    Nitish Joshi, Javier Rando, Abulhair Saparov, Najoung Kim, and He He. Personas as a way to model truthfulness in language models.ArXiv, abs/2310.18168, 2023. URL https: //api.semanticscholar.org/CorpusID:264555113

  38. [38]

    Where is the mind? persona vectors and llm individuation

    Pierre Beckmann and Patrick Butlin. Where is the mind? persona vectors and llm individuation

  39. [39]

    URLhttps://api.semanticscholar.org/CorpusID:287635493

  40. [40]

    The linear representation hypothesis and the geometry of large language models

    Kiho Park, Yo Joong Choe, and Victor Veitch. The linear representation hypothesis and the geometry of large language models. InInternational Conference on Machine Learning, 2023. URLhttps://api.semanticscholar.org/CorpusID:265042984

  41. [41]

    arXiv preprint arXiv:2406.01506 , year=

    Kiho Park, Yo Joong Choe, Yibo Jiang, and Victor Veitch. The geometry of categorical and hierarchical concepts in large language models.ArXiv, abs/2406.01506, 2024. URL https://api.semanticscholar.org/CorpusID:270216615

  42. [42]

    Linear representations of hierarchical concepts in language models

    Masaki Sakata, Benjamin Heinzerling, Takumi Ito, Sho Yokoi, and Kentaro Inui. Linear representations of hierarchical concepts in language models. 2026. URL https://api. semanticscholar.org/CorpusID:287255958

  43. [43]

    Michaud, Wes Gurnee, and Max Tegmark

    Joshua Engels, Isaac Liao, Eric J. Michaud, Wes Gurnee, and Max Tegmark. Not all language model features are one-dimensionally linear. InInternational Conference on Learning Repre- sentations, 2024. URLhttps://api.semanticscholar.org/CorpusID:269983112

  44. [44]

    Linguistic regularities in continuous space word representations

    Tomas Mikolov, Wen tau Yih, and Geoffrey Zweig. Linguistic regularities in continuous space word representations. InNorth American Chapter of the Association for Computational Linguistics, 2013. URLhttps://api.semanticscholar.org/CorpusID:7478738

  45. [45]

    Toy Models of Superposition

    Nelson Elhage, Tristan Hume, Catherine Olsson, Nicholas Schiefer, Thomas Henighan, Shauna Kravec, Zac Hatfield-Dodds, Robert Lasenby, Dawn Drain, Carol Chen, Roger Baker Grosse, Sam McCandlish, Jared Kaplan, Dario Amodei, Martin Wattenberg, and Chris Olah. Toy models of superposition.ArXiv, abs/2209.10652, 2022. URL https://api.semanticscholar.org/ Corpus...

  46. [46]

    Probing classifiers: Promises, shortcomings, and advances.Computational Linguistics, 48:207–219, 2021

    Yonatan Belinkov. Probing classifiers: Promises, shortcomings, and advances.Computational Linguistics, 48:207–219, 2021. URL https://api.semanticscholar.org/CorpusID: 236924832

  47. [47]

    Sparse Autoencoders Find Highly Interpretable Features in Language Models

    Hoagy Cunningham, Aidan Ewart, Logan Riggs Smith, Robert Huben, and Lee Sharkey. Sparse autoencoders find highly interpretable features in language models.ArXiv, abs/2309.08600,

  48. [48]

    URLhttps://api.semanticscholar.org/CorpusID:261934663

  49. [49]

    Scaling and evaluating sparse autoencoders

    Leo Gao, Tom Dupr’e la Tour, Henk Tillman, Gabriel Goh, Rajan Troll, Alec Radford, Ilya Sutskever, Jan Leike, and Jeffrey Wu. Scaling and evaluating sparse autoencoders. ArXiv, abs/2406.04093, 2024. URL https://api.semanticscholar.org/CorpusID: 270286001. 12

  50. [50]

    Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models

    Samuel Marks, Can Rager, Eric J. Michaud, Yonatan Belinkov, David Bau, and Aaron Mueller. Sparse feature circuits: Discovering and editing interpretable causal graphs in language mod- els.ArXiv, abs/2403.19647, 2024. URL https://api.semanticscholar.org/CorpusID: 268732732

  51. [51]

    Exploring task performance with interpretable models via sparse auto-encoders

    Shunyu Wang, Tyler Loakman, Youbo Lei, Yi Liu, Bohao Yang, Yuting Zhao, Dong Yang, and Chenghua Lin. Exploring task performance with interpretable models via sparse auto-encoders. ArXiv, abs/2507.06427, 2025. URL https://api.semanticscholar.org/CorpusID: 280066641

  52. [52]

    The assis- tant axis: Situating and stabilizing the default persona of language models.arXiv preprint arXiv:2601.10387, 2026

    Christina Lu, Jack Gallagher, Jonathan Michala, Kyle Fish, and Jack Lindsey. The assis- tant axis: Situating and stabilizing the default persona of language models.arXiv preprint, arXiv:2601.10387, 2026

  53. [53]

    Steering Language Models With Activation Engineering

    Alexander Matt Turner, Lisa Thiergart, Gavin Leech, David Udell, Juan J. Vazquez, Ulisse Mini, and Monte MacDiarmid. Steering language models with activation engineering.arXiv preprint, arXiv:2308.10248, 2024

  54. [54]

    Representation Engineering: A Top-Down Approach to AI Transparency

    Andy Zou, Long Phan, Sarah Chen, James Campbell, Phillip Guo, Richard Ren, Alexander Pan, Xuwang Yin, Mantas Mazeika, Ann-Kathrin Dombrowski, Shashwat Goel, Nathaniel Li, Michael J. Byun, Zifan Wang, Alex Mallen, Steven Basart, Sanmi Koyejo, Dawn Song, Matt Fredrikson, J. Zico Kolter, and Dan Hendrycks. Representation engineering: A top-down approach to a...

  55. [55]

    Aligning large language models with human preferences through representation engineering

    Wenhao Liu, Xiaohua Wang, Muling Wu, Tianlong Li, Changze Lv, Zixuan Ling, Jianhao Zhu, Cenyuan Zhang, Xiaoqing Zheng, and Xuanjing Huang. Aligning large language models with human preferences through representation engineering. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar, editors,Proceedings of the 62nd Annual Meeting of the Association for Computat...

  56. [56]

    Steering llama 2 via contrastive activation addition

    Nina Rimsky, Nick Gabrieli, Julian Schulz, Meg Tong, Evan Hubinger, and Alexander Matt Turner. Steering llama 2 via contrastive activation addition. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar, editors,Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL, pages 15504–15522. Association f...

  57. [57]

    Viégas, Hanspeter Pfister, and Martin Wattenberg

    Kenneth Li, Oam Patel, Fernanda B. Viégas, Hanspeter Pfister, and Martin Wattenberg. Inference-time intervention: Eliciting truthful answers from a language model. In Alice Oh, Tristan Naumann, Amir Globerson, Kate Saenko, Moritz Hardt, and Sergey Levine, edi- tors,Advances in Neural Information Processing Systems 36: Annual Conference on Neural Informati...

  58. [58]

    Refusal in Language Models Is Mediated by a Single Direction

    Andy Arditi, Oscar Obeso, Aaquib Syed, Daniel Paleka, Nina Rimsky, Wes Gurnee, and Neel Nanda. Refusal in language models is mediated by a single direction.ArXiv, abs/2406.11717,

  59. [59]

    URLhttps://api.semanticscholar.org/CorpusID:270560489

  60. [60]

    Persona Vectors: Monitoring and Controlling Character Traits in Language Models

    Runjin Chen, Andy Arditi, Henry Sleight, Owain Evans, and Jack Lindsey. Persona vectors: Monitoring and controlling character traits in language models.ArXiv, abs/2507.21509, 2025. URLhttps://api.semanticscholar.org/CorpusID:280337840

  61. [61]

    Improving activation steering in language models with mean-centring

    Ole Jorgensen, Dylan Cope, Nandi Schoots, and Murray Shanahan. Improving activation steering in language models with mean-centring.ArXiv, abs/2312.03813, 2023. URL https: //api.semanticscholar.org/CorpusID:266053529

  62. [62]

    A systematic analysis of the impact of persona steering on llm capabilities

    Jiaqi Chen, Ming Wang, Tingna Xie, Shi Feng, and Yongkang Liu. A systematic analysis of the impact of persona steering on llm capabilities. 2026. URL https://api.semanticscholar. org/CorpusID:287432603

  63. [63]

    Persona: Dynamic and compositional inference-time personality control via activation vector algebra.arXiv preprint arXiv:2602.15669, 2026

    Xiachong Feng, Liang Zhao, Weihong Zhong, Yi-Chong Huang, Yuxuan Gu, Lingpeng Kong, Xiaocheng Feng, and Bing Qin. Persona: Dynamic and compositional inference-time personality control via activation vector algebra.ArXiv, abs/2602.15669, 2026. URL https://api. semanticscholar.org/CorpusID:285659291. 13

  64. [64]

    Introducing gpt-5.4 mini and nano

    OpenAI. Introducing gpt-5.4 mini and nano. https://openai.com/index/introducing-gpt-5-4- mini-and-nano/, 2026

  65. [65]

    Gemini 3.1 flash-lite model card

    Google DeepMind. Gemini 3.1 flash-lite model card. https://deepmind.google/models/model- cards/gemini-3-1-flash-lite/, 2026

  66. [66]

    Llm-based human simulations have not yet been reliable.arXiv preprint, arXiv:2501.08579, 2025

    Qian Wang, Jiaying Wu, Zichen Jiang, Zhenheng Tang, Bingqiao Luo, Nuo Chen, Wei Chen, and Bingsheng He. Llm-based human simulations have not yet been reliable.arXiv preprint, arXiv:2501.08579, 2025

  67. [67]

    Character-llm: A trainable agent for role-playing

    Yunfan Shao, Linyang Li, Junqi Dai, and Xipeng Qiu. Character-llm: A trainable agent for role-playing. In Houda Bouamor, Juan Pino, and Kalika Bali, editors,Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP, pages 13153–13187. Association for Computational Linguistics, 2023

  68. [68]

    Rolellm: Benchmarking, eliciting, and enhancing role-playing abilities of large language models

    Noah Wang, Zhongyuan Peng, Haoran Que, Jiaheng Liu, Wangchunshu Zhou, Yuhan Wu, Hongcheng Guo, Ruitong Gan, Zehao Ni, Jian Yang, Man Zhang, Zhaoxiang Zhang, Wanli Ouyang, Ke Xu, Wenhao Huang, Jie Fu, and Junran Peng. Rolellm: Benchmarking, eliciting, and enhancing role-playing abilities of large language models. In Lun-Wei Ku, Andre Martins, and Vivek Sri...

  69. [69]

    Two tales of persona in llms: A survey of role-playing and personalization

    Yu-Min Tseng, Yu-Chao Huang, Teng-Yun Hsiao, Wei-Lin Chen, Chao-Wei Huang, Yu Meng, and Yun-Nung Chen. Two tales of persona in llms: A survey of role-playing and personalization. In Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen, editors,Findings of the Association for Computational Linguistics: EMNLP2024, pages 16612–16631. Association for Computation...

  70. [70]

    The oscars of ai theater: A survey on role-playing with language models.arXiv preprint arXiv:2407.11484, 2024

    Nuo Chen, Yan Wang, Yang Deng, and Jia Li. The oscars of ai theater: A survey on role-playing with language models.arXiv preprint, arXiv:2407.11484, 2025

  71. [71]

    Measuring and controlling instruction (in)stability in language model dialogs.arXiv preprint, arXiv:2402.10962, 2024

    Kenneth Li, Tianle Liu, Naomi Bashkansky, David Bau, Fernanda Viégas, Hanspeter Pfister, and Martin Wattenberg. Measuring and controlling instruction (in)stability in language model dialogs.arXiv preprint, arXiv:2402.10962, 2024

  72. [72]

    Rnr: Teaching large language models to follow roles and rules,

    Kuan Wang, Alexander Bukharin, Haoming Jiang, Qingyu Yin, Zhengyang Wang, Tuo Zhao, Jingbo Shang, Chao Zhang, Bing Yin, Xian Li, Jianshu Chen, and Shiyang Li. Rnr: Teaching large language models to follow roles and rules.arXiv preprint, arXiv:2409.13733, 2024

  73. [73]

    Representation engineering for large-language models: Survey and research challenges.arXiv preprint arXiv:2502.17601,

    Lukasz Bartoszcze, Sarthak Munshi, Bryan Sukidi, Jennifer Yen, Zejia Yang, David Williams- King, Linh Le, Kosi Asuzu, and Carsten Maple. Representation engineering for large-language models: Survey and research challenges.arXiv preprint, arXiv:2502.17601, 2025

  74. [74]

    Harvard university press, 1990

    James S Coleman.Foundations of social theory. Harvard university press, 1990

  75. [75]

    Univ of California Press, 1984

    Anthony Giddens.The constitution of society: Outline of the theory of structuration. Univ of California Press, 1984

  76. [76]

    How should we deal with rising housing costs?

    Urie Bronfenbrenner.The ecology of human development: Experiments by nature and design. Harvard university press, 1979. 14 A Qualitative Examples Table 6 extends Table 4 with a coefficient-level qualitative sweep on a single shared generic prompt. We use the housing prompt because it admits both individual coping advice and institutional policy responses,...

  77. [77]

    {description}

    You are {name}. {description}. Respond entirely from this perspective as a {level_name} entity. Do not mention being an AI

  78. [78]

    Stay fully in character and respond based on your lived experience and constraints

    Please role-play as {name}: {description}. Stay fully in character and respond based on your lived experience and constraints

  79. [79]

    {description}

    Act as {name}. {description}. All your responses should reflect the priorities, language, and worldview of someone operating at the {level_name} level of social organization

  80. [80]

    {description}

    You are {name}, a {level_name} entity. {description}. Speak in first person. Your answers should reflect your unique scale of influence, time horizon, and decision-making logic

Showing first 80 references.