arxiv: 2605.06196 · v1 · submitted 2026-05-07 · 💻 cs.AI · cs.CL

Recognition: unknown

The Granularity Axis: A Micro-to-Macro Latent Direction for Social Roles in Language Models

Chonghan Qin , Xiachong Feng , Ziyun Song , Xiaocheng Feng , Jing Xiong , Lingpeng Kong

Authors on Pith no claims yet

Pith reviewed 2026-05-08 10:08 UTC · model grok-4.3

classification 💻 cs.AI cs.CL

keywords language modelssocial rolesgranularity axishidden statesactivation steeringrole representationslatent directions

0 comments

The pith

Language models encode social role granularity as a dominant latent direction from individual to institutional scale.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that large language models internally represent the scale of prompted social roles, from micro-level personal experience to macro-level organizational or national reasoning, as an ordered geometric feature rather than a surface style choice. By averaging hidden states of macro roles minus those of micro roles, they isolate an axis that aligns with the first principal component of all role representations at cosine 0.972 and captures 52.6 percent of variance in Qwen3-8B. Projections of roles onto this axis rise steadily across five defined granularity levels, stay consistent under changes in layers, prompts, and data splits, and transfer to Llama-3.1-8B-Instruct. Steering model activations along the axis predictably shifts the granularity of generated answers, moving Llama responses from 2.00 to 3.17 on a five-point scale for suitable prompts.

Core claim

We show that LLMs encode the granularity of social roles, from micro-level individual experience to macro-level organizational, institutional, or national reasoning, as a structured, ordered, and causally manipulable latent direction. We define a contrast-based Granularity Axis as the difference between mean macro- and micro-role hidden states. In Qwen3-8B this axis aligns with PC1 of the role representation space at cosine 0.972 and accounts for 52.6 percent of its variance. We construct 75 social roles across five granularity levels, collect 91,200 role-conditioned responses, extract role-level hidden states, and find that projections increase monotonically across levels, remain stable, 0.

What carries the argument

The Granularity Axis, defined as the vector difference between the average hidden states of macro-scale roles and micro-scale roles, which functions as the primary geometric direction organizing the space of role representations.

If this is right

Role hidden states project monotonically onto the axis across all five granularity levels and across prompt variants.
The axis remains stable across model layers, endpoint choices, held-out data splits, and transfers to a second model.
Positive or negative activation steering along the axis shifts generated response granularity in the predicted direction.
The two tested models show different degrees of steering controllability depending on their default behavior.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This geometric structure could let practitioners read out or adjust the perspective scale of role-play outputs without rewriting prompts.
The same contrast method might reveal other ordered dimensions such as time horizon or emotional intensity in role representations.
If the axis generalizes, it offers a way to test whether models internally rank social contexts by scope rather than treating them as flat labels.

Load-bearing premise

The author-chosen 75 roles and five granularity levels cleanly separate micro-to-macro distinctions without other factors such as topic or response style driving the hidden-state patterns.

What would settle it

If role projections onto the axis fail to increase monotonically with assigned granularity level, or if activation steering along the axis produces no reliable shift in measured response granularity on held-out prompts, the central claim would not hold.

Figures

Figures reproduced from arXiv: 2605.06196 by Chonghan Qin, Jing Xiong, Lingpeng Kong, Xiachong Feng, Xiaocheng Feng, Ziyun Song.

**Figure 1.** Figure 1: Overview of the Granularity Axis pipeline. We construct ordered social roles, collect role-conditioned responses, extract role-level hidden-state representations, define a contrast-based Granularity Axis, and test its behavioral effect through activation steering. A central mechanism behind this flexibility is role conditioning [22–27]. By prompting a model to respond as a worried parent, a community organ… view at source ↗

**Figure 2.** Figure 2: Role representation space. Role-level hidden-state representations organize along a micro-to-macro structure. Colors indicate granularity level (L1–L5), and the dashed arrow denotes the contrast-defined Granularity Axis from micro-level to macro-level roles. Three findings support this hypothesis. First, and most strikingly, social role granularity is not one factor among many but the dominant geometric ax… view at source ↗

**Figure 3.** Figure 3: Ordered projections on the Granularity Axis. Points are roles grouped by granularity level; black circles mark level means, shaded bands within-level spread, stars the default assistant. Projections rise monotonically L1→L5 in both models; the default sits in a meso-to-macro region (near L3 in Qwen3-8B, L4 in Llama-3.1-8B-Instruct). v (ℓ) u,s,q = 1 T PT t=1 h (ℓ) t , then average over the response set R(u)… view at source ↗

**Figure 4.** Figure 4: System prompt templates used in the main pipeline for role-conditioned response generation. view at source ↗

**Figure 5.** Figure 5: Evaluation prompt used for role-play quality scoring in the main pipeline. The judge model view at source ↗

**Figure 6.** Figure 6: Judge prompt used for steering evaluation. The judge model rates each steered response view at source ↗

read the original abstract

Large language models (LLMs) are routinely prompted to take on social roles ranging from individuals to institutions, yet it remains unclear whether their internal representations encode the granularity of such roles, from micro-level individual experience to macro-level organizational, institutional, or national reasoning. We show that they do. We define a contrast-based Granularity Axis as the difference between mean macro- and micro-role hidden states. In Qwen3-8B, this axis aligns with the principal axis (PC1) of the role representation space at cosine 0.972 and accounts for 52.6% of its variance, indicating that granularity is the dominant geometric axis organizing prompted social roles. We construct 75 social roles across five granularity levels and collect 91,200 role-conditioned responses over shared questions and prompt variants, then extract role-level hidden states and project them onto the axis. Role projections increase monotonically across all five levels, remain stable across layers, prompt variants, endpoint definitions, held-out splits, and score-filtered subsets, and transfer to Llama-3.1-8B-Instruct. The axis is also causally relevant: activation steering along it shifts response granularity in the predicted direction, with Llama moving from 2.00 to 3.17 on a five-point macro scale under positive steering on prompts that admit local responses. The two models differ in controllability, suggesting that steering depends on each model's default operating regime. Overall, our findings suggest that social role granularity is not merely a stylistic surface feature, but a structured, ordered, and causally manipulable latent direction in role-conditioned language model behavior.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper isolates a dominant granularity direction in role-conditioned hidden states that aligns with PC1 and responds to steering, though the author-defined roles leave open the possibility of topic or style confounds.

read the letter

The main thing to know is that this work identifies a single latent direction tied to micro-to-macro social role granularity, shows it dominates the representation space in one model, and demonstrates that steering along it shifts output granularity in the expected direction. They build the axis from the mean difference between macro and micro role hidden states, then verify it lines up with PC1 at cosine 0.972 and explains over half the variance in Qwen3-8B. Projections are monotonic across five levels, hold across layers, prompt variants, held-out data, and transfer to Llama-3.1-8B, with steering moving responses from 2.00 to 3.17 on their scale in the right model. That combination of geometric dominance and causal effect is the concrete advance over prior role-prompting or generic steering papers. The dataset scale—75 roles and 91k responses—gives the monotonicity and stability claims some weight. The steering experiment is the part that feels most actionable for interpretability work. The soft spot sits in the role construction itself. The axis is defined from author-chosen endpoints, and the abstract does not describe explicit balancing or regression for topic, response length, or lexical style across levels. Stability on held-out splits and filtered subsets helps, but without those controls the observed structure could still reflect a composite direction rather than pure granularity. The model difference in steering success also suggests the effect depends on baseline behavior more than the paper emphasizes. This is useful reading for people already working on activation steering or role representations in LLMs. It is not a foundational result, but the geometric and causal pieces are sharp enough that a serious editor should send it to referees for closer checks on the role set and any additional controls.

Referee Report

3 major / 2 minor

Summary. The paper claims that LLMs internally represent the granularity of prompted social roles (micro-level individuals to macro-level institutions) along a dominant latent direction. It constructs 75 roles across five granularity levels, collects 91,200 role-conditioned responses, defines a Granularity Axis as the mean difference between macro- and micro-role hidden states, shows this axis aligns with PC1 (cosine 0.972, 52.6% variance) in Qwen3-8B, produces monotonic projections across levels that are stable across layers/prompts/held-out splits/score-filtered subsets and transfer to Llama-3.1-8B-Instruct, and demonstrates causal relevance via activation steering that shifts response granularity scores in the predicted direction.

Significance. If the central geometric and causal claims hold after controls for confounds, the result would be significant: it identifies granularity as a structured, ordered, and manipulable latent direction organizing social roles in LLMs rather than a surface stylistic feature. Strengths include the scale of the dataset (91,200 responses), multiple stability checks (held-out splits, score-filtered subsets, endpoint variants), cross-model transfer, and the causal steering experiment that produces measurable shifts (e.g., Llama from 2.00 to 3.17). These elements provide reproducible empirical grounding for a falsifiable geometric hypothesis.

major comments (3)

[§3] §3 (Role Construction and Data Collection): The 75 author-defined roles and five granularity levels are presented without reported balancing, matching, or regression controls for topic, average response length, or lexical/stylistic differences across levels. Because the Granularity Axis is defined directly from the mean hidden states of these roles and then shown to align with PC1, any correlation between these covariates and the level labels would make the observed alignment (cosine 0.972) and monotonic projections consistent with a composite direction rather than a pure granularity axis. This is load-bearing for the claim that granularity is the dominant organizing axis.
[§4.2] §4.2 (PCA Alignment and Variance): The reported 52.6% variance explained by PC1 and its alignment with the author-defined axis would be more convincing with an explicit baseline comparison (e.g., variance explained by axes derived from random role groupings or shuffled granularity labels). Without this, it remains possible that the high alignment is partly an artifact of how the endpoint sets were chosen rather than evidence that granularity is uniquely dominant in the 75-point role space.
[§5] §5 (Activation Steering): The steering results (shift from 2.00 to 3.17 on the five-point scale) are promising but depend on the claim that the prompts 'admit local responses'; the paper should report how the five-point macro scale was applied by raters and whether inter-rater reliability or prompt selection criteria were pre-registered to ensure the granularity shift is not driven by changes in topic or length induced by the steering vector.

minor comments (2)

[§3] The definition of the Granularity Axis would benefit from an explicit equation (e.g., A = mean(H_macro) - mean(H_micro)) in the main text rather than only in prose.
[Figures 2-4] Figure captions for the projection plots should include the number of responses per level and any error bars or confidence intervals to allow visual assessment of the monotonicity strength.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive comments on our manuscript. We address each of the major concerns point by point below, providing clarifications and committing to revisions where appropriate to strengthen the paper.

read point-by-point responses

Referee: [§3] §3 (Role Construction and Data Collection): The 75 author-defined roles and five granularity levels are presented without reported balancing, matching, or regression controls for topic, average response length, or lexical/stylistic differences across levels. Because the Granularity Axis is defined directly from the mean hidden states of these roles and then shown to align with PC1, any correlation between these covariates and the level labels would make the observed alignment (cosine 0.972) and monotonic projections consistent with a composite direction rather than a pure granularity axis. This is load-bearing for the claim that granularity is the dominant organizing axis.

Authors: We appreciate this observation, as controlling for potential confounds is crucial for isolating the granularity effect. The roles were designed with shared questions across all granularity levels to control for topic, and prompt variants were used to increase robustness. However, we acknowledge that explicit balancing for response length and lexical features was not reported. In the revised manuscript, we will include: (i) statistics on average response lengths and token counts per granularity level, (ii) correlations between lexical metrics (e.g., type-token ratio) and level, and (iii) a regression analysis where we residualize the hidden states for these covariates before computing the axis and projections. This will demonstrate whether the axis remains aligned with PC1 after controls. We believe this addresses the concern without altering the core findings. revision: yes
Referee: [§4.2] §4.2 (PCA Alignment and Variance): The reported 52.6% variance explained by PC1 and its alignment with the author-defined axis would be more convincing with an explicit baseline comparison (e.g., variance explained by axes derived from random role groupings or shuffled granularity labels). Without this, it remains possible that the high alignment is partly an artifact of how the endpoint sets were chosen rather than evidence that granularity is uniquely dominant in the 75-point role space.

Authors: We agree that baseline comparisons are important to establish the uniqueness of the granularity axis. We will add to the revised §4.2 results from 1000 random role groupings (randomly assigning roles to micro/macro endpoints) and shuffled label permutations, computing the distribution of cosine similarities and variance explained. Our preliminary checks suggest the observed 0.972 cosine and 52.6% variance are outliers compared to these null distributions, supporting that granularity is the dominant direction. This addition will make the claim more robust. revision: yes
Referee: [§5] §5 (Activation Steering): The steering results (shift from 2.00 to 3.17 on the five-point scale) are promising but depend on the claim that the prompts 'admit local responses'; the paper should report how the five-point macro scale was applied by raters and whether inter-rater reliability or prompt selection criteria were pre-registered to ensure the granularity shift is not driven by changes in topic or length induced by the steering vector.

Authors: We thank the referee for highlighting the need for more details on the evaluation. The five-point scale was applied by two independent raters who scored responses on granularity from 1 (micro/individual) to 5 (macro/institutional), with prompts selected as those allowing both local and global responses (e.g., questions about decision-making that can be answered personally or organizationally). In the revision, we will provide the full rating instructions, report inter-rater agreement (e.g., Pearson correlation or kappa), and confirm that steered responses did not significantly differ in length or topic from baseline (via manual inspection and metrics). While the experiment was not pre-registered, the selection criteria were based on pilot testing to ensure validity. To further address confounds, we will add analysis showing the steering primarily affects granularity-related content. revision: partial

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper defines the Granularity Axis explicitly as the difference between mean macro-role and mean micro-role hidden states, then reports its empirical alignment with PC1 of the 75-role space (cosine 0.972, 52.6% variance) and monotonic projections on intermediate levels as observed results. These are independent checks against the model's actual hidden-state geometry rather than reductions by construction. Steering experiments, cross-model transfer, and stability across splits provide further external validation. No self-citation chain, ansatz smuggling, or renaming of known results occurs; the derivation remains self-contained against the collected role-conditioned activations.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 1 invented entities

The central claim rests on the assumption that hidden states encode role granularity independently of other factors and that the author-defined role categories validly span the micro-macro spectrum. The granularity axis is introduced as a derived construct without external grounding beyond the reported internal consistencies.

free parameters (2)

Granularity level definitions and role assignments
The five levels and 75 specific roles are constructed by the authors; different choices could alter the observed monotonicity and variance explained.
Micro and macro endpoint role sets
The means used to define the axis depend on which roles are labeled micro versus macro.

axioms (2)

domain assumption Hidden states extracted from role-conditioned prompts reflect the semantic granularity of the assigned role
Invoked when computing role-level hidden states and projecting them onto the axis.
domain assumption The contrast between mean macro and micro hidden states isolates granularity rather than correlated factors such as response style or topic
This is the definitional step for the axis and is required for the claim that it organizes social roles.

invented entities (1)

Granularity Axis no independent evidence
purpose: A latent direction in hidden-state space that orders social roles from micro to macro
Constructed via contrast of model activations; no independent external evidence such as a predicted observable outside the model is provided.

pith-pipeline@v0.9.0 · 5614 in / 1710 out tokens · 97916 ms · 2026-05-08T10:08:04.901063+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

81 extracted references · 31 canonical work pages · 10 internal anchors

[1]

Gpt-5 system card

OpenAI. Gpt-5 system card. https://cdn.openai.com/gpt-5-system-card.pdf, 2025

2025
[2]

The Llama 3 Herd of Models

AI @ Meta Llama Team. The llama 3 herd of models.arXiv preprint, arXiv:2407.21783, 2024

work page internal anchor Pith review arXiv 2024
[3]

Qwen3 Technical Report

Qwen Team. Qwen3 technical report.arXiv preprint, arXiv:2505.09388, 2025

work page internal anchor Pith review arXiv 2025
[4]

Gemini 3 model card

Google. Gemini 3 model card. https://deepmind.google/models/gemini, 2025

2025
[5]

Claude model card

Anthropic. Claude model card. https://www.anthropic.com/system-cards, 2026

2026
[6]

O’Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S

Joon Sung Park, Joseph C. O’Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S. Bernstein. Generative agents: Interactive simulacra of human behavior. In Sean Follmer, Jeff Han, Jürgen Steimle, and Nathalie Henry Riche, editors,Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, UIST, pages 2:1–2:22,
[7]

doi: 10.1145/3586183.3606763

work page doi:10.1145/3586183.3606763
[8]

arXiv preprint arXiv:2412.03563 , year=

Xinyi Mou, Xuanwen Ding, Qi He, Liang Wang, Jingcong Liang, Xinnong Zhang, Libo Sun, Jiayu Lin, Jie Zhou, Xuanjing Huang, and Zhongyu Wei. From individual to society: A survey on social simulation driven by large language model-based agents.arXiv preprint, arXiv:2412.03563, 2024

work page arXiv 2024
[9]

Richardson, Austin C

Jacy Reese Anthis, Ryan Liu, Sean M. Richardson, Austin C. Kozlowski, Bernard Koch, Erik Brynjolfsson, James A. Evans, and Michael S. Bernstein. Position: LLM social simulations are a promising research method. InForty-second International Conference on Machine Learning, ICML, 2025

2025
[10]

White, Doug Burger, and Chi Wang

Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang (Eric) Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, Ahmed Hassan Awadallah, Ryen W. White, Doug Burger, and Chi Wang. Autogen: Enabling next-gen llm applications via multi-agent conversa- tion. 2023. URLhttps://api.semanticscholar.org/CorpusID:263611068

2023
[11]

Li, Hasan Hammoud, Hani Itani, Dmitrii Khizbullin, and Bernard Ghanem

G. Li, Hasan Hammoud, Hani Itani, Dmitrii Khizbullin, and Bernard Ghanem. Camel: Commu- nicative agents for "mind" exploration of large language model society.Advances in Neural Information Processing Systems 36, 2023. URL https://api.semanticscholar.org/ CorpusID:268042527

2023
[12]

Metagpt: Meta programming for a multi-agent collaborative framework

Sirui Hong, Mingchen Zhuge, Jonathan Chen, Xiawu Zheng, Yuheng Cheng, Ceyao Zhang, Jinlin Wang, Zili Wang, Steven Ka Shing Yau, Zi Hen Lin, Liyang Zhou, Chenyu Ran, Lingfeng Xiao, Chenglin Wu, and Jürgen Schmidhuber. Metagpt: Meta programming for a multi-agent collaborative framework. InInternational Conference on Learning Representations, 2023. URL https...

2023
[13]

Chatdev: Communicative agents for software development

Cheng Qian, Wei Liu, Hongzhang Liu, Nuo Chen, Yufan Dang, Jiahao Li, Cheng Yang, Weize Chen, Yusheng Su, Xin Cong, Juyuan Xu, Dahai Li, Zhiyuan Liu, and Maosong Sun. Chatdev: Communicative agents for software development. InAnnual Meeting of the Association for Computational Linguistics, 2023. URL https://api.semanticscholar.org/CorpusID: 270257715

2023
[14]

Argyle, E

Lisa P. Argyle, E. Busby, Nancy Fulda, Joshua R Gubler, Christopher Rytting, and David Wingate. Out of one, many: Using language models to simulate human samples.Politi- cal Analysis, 31:337 – 351, 2022. URL https://api.semanticscholar.org/CorpusID: 252280474

2022
[15]

T¨ ornberg, D

Petter Törnberg, Diliara Valeeva, Justus Uitermark, and Christopher Bail. Simulating social media using large language models to evaluate alternative news feed algorithms. ArXiv, abs/2310.05984, 2023. URL https://api.semanticscholar.org/CorpusID: 263831233

work page arXiv 2023
[16]

Epidemic modeling with generative agents

Ross Williams, Niyousha Hosseinichimeh, Aritra Majumdar, and Navid Ghaffarzadegan. Epidemic modeling with generative agents.ArXiv, abs/2307.04986, 2023. URL https: //api.semanticscholar.org/CorpusID:259766713. 10

work page arXiv 2023
[17]

Econagent: Large language model-empowered agents for simulating macroeconomic activities

Nian Li, Chen Gao, Mingyu Li, Yong Li, and Qingmin Liao. Econagent: Large language model-empowered agents for simulating macroeconomic activities. InAnnual Meeting of the Association for Computational Linguistics, 2023. URL https://api.semanticscholar. org/CorpusID:264146527

2023
[18]

Chen, and Khaldoun Khashanah

Yang Li, Yangyang Yu, Haohang Li, Z. Chen, and Khaldoun Khashanah. Tradinggpt: Multi- agent system with layered memory and distinct characters for enhanced financial trading performance. 2023. URLhttps://api.semanticscholar.org/CorpusID:261582775

2023
[19]

arXiv preprint arXiv:2303.17548 , year =

Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. Whose opinions do language models reflect?ArXiv, abs/2303.17548, 2023. URLhttps://api.semanticscholar.org/CorpusID:257834040

work page arXiv 2023
[20]

Clinton, Cassy Dorff, Brenton Kenkel, and Jennifer M

James Bisbee, Joshua D. Clinton, Cassy Dorff, Brenton Kenkel, and Jennifer M. Larson. Synthetic replacements for human survey data? the perils of large language models.Political Analysis, 2024. URLhttps://api.semanticscholar.org/CorpusID:269845858

2024
[21]

Ireland, Shashanka Subrahmanya, João Sedoc, Lyle H

Aadesh Salecha, Molly E. Ireland, Shashanka Subrahmanya, João Sedoc, Lyle H. Ungar, and Johannes C. Eichstaedt. Large language models show human-like social desirability biases in survey responses.arXiv preprint, arXiv:2405.06058, 2024

work page arXiv 2024
[22]

Peterson, Ilia Sucholutsky, and Thomas L

Ryan Liu, Jiayi Geng, Joshua C. Peterson, Ilia Sucholutsky, and Thomas L. Griffiths. Large language models assume people are more rational than we really are. InThe Thirteenth International Conference on Learning Representations,ICLR. OpenReview.net, 2025. URL https://openreview.net/forum?id=dAeET8gxqg

2025
[23]

Character-llm: A trainable agent for role-playing

Yunfan Shao, Linyang Li, Junqi Dai, and Xipeng Qiu. Character-llm: A trainable agent for role-playing.ArXiv, abs/2310.10158, 2023. URL https://api.semanticscholar.org/ CorpusID:264145862

work page arXiv 2023
[24]

Rolellm: Benchmarking, eliciting, and enhancing role-playing abilities of large language models

Zekun Moore Wang, Zhongyuan Peng, Haoran Que, Jiaheng Liu, Wangchunshu Zhou, Yuhang Wu, Hongcheng Guo, Ruitong Gan, Zehao Ni, Man Zhang, Zhaoxiang Zhang, Wanli Ouyang, Ke Xu, Wenhu Chen, Jie Fu, and Junran Peng. Rolellm: Benchmarking, eliciting, and enhancing role-playing abilities of large language models. InAnnual Meeting of the Association for Computat...

2023
[25]

Personallm: In- vestigating the ability of large language models to express big five personality traits

Hang Jiang, Xiajie Zhang, Xubo Cao, Cynthia Breazeal, and Jad Kabbara. Personallm: In- vestigating the ability of large language models to express big five personality traits. URL https://api.semanticscholar.org/CorpusID:265221392
[26]

In- character: Evaluating personality fidelity in role-playing agents through psychological in- terviews

Xintao Wang, Yunze Xiao, Jen-Tse Huang, Siyu Yuan, Rui Xu, Haoran Guo, Quan Tu, Yaying Fei, Ziang Leng, Wei Wang, Jiangjie Chen, Cheng Li, and Yanghua Xiao. In- character: Evaluating personality fidelity in role-playing agents through psychological in- terviews. InAnnual Meeting of the Association for Computational Linguistics, 2023. URL https://api.seman...

2023
[27]

Charactereval: A chinese benchmark for role- playing conversational agent evaluation

Quan Tu, Shilong Fan, Zihang Tian, and Rui Yan. Charactereval: A chinese benchmark for role- playing conversational agent evaluation. InAnnual Meeting of the Association for Computational Linguistics, 2024. URLhttps://api.semanticscholar.org/CorpusID:266725287

2024
[28]

Better zero-shot reasoning with role-play prompting

Aobo Kong, Shiwan Zhao, Hao Chen, Qicheng Li, Yong Qin, Ruiqi Sun, and Xiaoxia Zhou. Better zero-shot reasoning with role-play prompting. InNorth American Chapter of the Association for Computational Linguistics, 2023. URL https://api.semanticscholar. org/CorpusID:260900230

2023
[29]

Francesco Salvi, Manoel Horta Ribeiro, Riccardo Gallotti, and Robert West

Leonard Salewski, Stephan Alaniz, Isabel Rio-Torto, Eric Schulz, and Zeynep Akata. In-context impersonation reveals large language models’ strengths and biases.ArXiv, abs/2305.14930,

work page arXiv
[30]

URLhttps://api.semanticscholar.org/CorpusID:258866192
[31]

all that glitters

Shashank Gupta, Vaishnavi Shrivastava, A. Deshpande, A. Kalyan, Peter Clark, Ashish Sab- harwal, and Tushar Khot. Bias runs deep: Implicit reasoning biases in persona-assigned llms.ArXiv, abs/2311.04892, 2023. URL https://api.semanticscholar.org/CorpusID: 265050702. 11

work page arXiv 2023
[32]

James S. Coleman. Foundations of social theory. 1990. URL https://api. semanticscholar.org/CorpusID:145109282

1990
[33]

Schelling

Thomas C. Schelling. Micromotives and macrobehavior. 1978. URL https://api. semanticscholar.org/CorpusID:143387748

1978
[34]

Granovetter

Mark S. Granovetter. Threshold models of collective behavior.American Journal of Sociology, 83:1420 – 1443, 1978. URLhttps://api.semanticscholar.org/CorpusID:49314397

1978
[35]

Marked personas: Using natural language prompts to measure stereotypes in language models.ArXiv, abs/2305.18189, 2023

Myra Cheng, Esin Durmus, and Dan Jurafsky. Marked personas: Using natural language prompts to measure stereotypes in language models.ArXiv, abs/2305.18189, 2023. URL https://api.semanticscholar.org/CorpusID:258960243

work page arXiv 2023
[36]

Junyi Li, Ninareh Mehrabi, Charith Peris, Palash Goyal, Kai-Wei Chang, A. G. Galstyan, Richard Zemel, and Rahul Gupta. The steerability of large language models toward data-driven personas. InNorth American Chapter of the Association for Computational Linguistics, 2023. URLhttps://api.semanticscholar.org/CorpusID:265067297

2023
[37]

2023 , month = oct, journal =

Nitish Joshi, Javier Rando, Abulhair Saparov, Najoung Kim, and He He. Personas as a way to model truthfulness in language models.ArXiv, abs/2310.18168, 2023. URL https: //api.semanticscholar.org/CorpusID:264555113

work page arXiv 2023
[38]

Where is the mind? persona vectors and llm individuation

Pierre Beckmann and Patrick Butlin. Where is the mind? persona vectors and llm individuation
[39]

URLhttps://api.semanticscholar.org/CorpusID:287635493
[40]

The linear representation hypothesis and the geometry of large language models

Kiho Park, Yo Joong Choe, and Victor Veitch. The linear representation hypothesis and the geometry of large language models. InInternational Conference on Machine Learning, 2023. URLhttps://api.semanticscholar.org/CorpusID:265042984

2023
[41]

arXiv preprint arXiv:2406.01506 , year=

Kiho Park, Yo Joong Choe, Yibo Jiang, and Victor Veitch. The geometry of categorical and hierarchical concepts in large language models.ArXiv, abs/2406.01506, 2024. URL https://api.semanticscholar.org/CorpusID:270216615

work page arXiv 2024
[42]

Linear representations of hierarchical concepts in language models

Masaki Sakata, Benjamin Heinzerling, Takumi Ito, Sho Yokoi, and Kentaro Inui. Linear representations of hierarchical concepts in language models. 2026. URL https://api. semanticscholar.org/CorpusID:287255958

2026
[43]

Michaud, Wes Gurnee, and Max Tegmark

Joshua Engels, Isaac Liao, Eric J. Michaud, Wes Gurnee, and Max Tegmark. Not all language model features are one-dimensionally linear. InInternational Conference on Learning Repre- sentations, 2024. URLhttps://api.semanticscholar.org/CorpusID:269983112

2024
[44]

Linguistic regularities in continuous space word representations

Tomas Mikolov, Wen tau Yih, and Geoffrey Zweig. Linguistic regularities in continuous space word representations. InNorth American Chapter of the Association for Computational Linguistics, 2013. URLhttps://api.semanticscholar.org/CorpusID:7478738

2013
[45]

Toy Models of Superposition

Nelson Elhage, Tristan Hume, Catherine Olsson, Nicholas Schiefer, Thomas Henighan, Shauna Kravec, Zac Hatfield-Dodds, Robert Lasenby, Dawn Drain, Carol Chen, Roger Baker Grosse, Sam McCandlish, Jared Kaplan, Dario Amodei, Martin Wattenberg, and Chris Olah. Toy models of superposition.ArXiv, abs/2209.10652, 2022. URL https://api.semanticscholar.org/ Corpus...

work page internal anchor Pith review arXiv 2022
[46]

Probing classifiers: Promises, shortcomings, and advances.Computational Linguistics, 48:207–219, 2021

Yonatan Belinkov. Probing classifiers: Promises, shortcomings, and advances.Computational Linguistics, 48:207–219, 2021. URL https://api.semanticscholar.org/CorpusID: 236924832

2021
[47]

Sparse Autoencoders Find Highly Interpretable Features in Language Models

Hoagy Cunningham, Aidan Ewart, Logan Riggs Smith, Robert Huben, and Lee Sharkey. Sparse autoencoders find highly interpretable features in language models.ArXiv, abs/2309.08600,

work page internal anchor Pith review arXiv
[48]

URLhttps://api.semanticscholar.org/CorpusID:261934663
[49]

Scaling and evaluating sparse autoencoders

Leo Gao, Tom Dupr’e la Tour, Henk Tillman, Gabriel Goh, Rajan Troll, Alec Radford, Ilya Sutskever, Jan Leike, and Jeffrey Wu. Scaling and evaluating sparse autoencoders. ArXiv, abs/2406.04093, 2024. URL https://api.semanticscholar.org/CorpusID: 270286001. 12

work page internal anchor Pith review arXiv 2024
[50]

Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models

Samuel Marks, Can Rager, Eric J. Michaud, Yonatan Belinkov, David Bau, and Aaron Mueller. Sparse feature circuits: Discovering and editing interpretable causal graphs in language mod- els.ArXiv, abs/2403.19647, 2024. URL https://api.semanticscholar.org/CorpusID: 268732732

work page internal anchor Pith review arXiv 2024
[51]

Exploring task performance with interpretable models via sparse auto-encoders

Shunyu Wang, Tyler Loakman, Youbo Lei, Yi Liu, Bohao Yang, Yuting Zhao, Dong Yang, and Chenghua Lin. Exploring task performance with interpretable models via sparse auto-encoders. ArXiv, abs/2507.06427, 2025. URL https://api.semanticscholar.org/CorpusID: 280066641

work page arXiv 2025
[52]

The assis- tant axis: Situating and stabilizing the default persona of language models.arXiv preprint arXiv:2601.10387, 2026

Christina Lu, Jack Gallagher, Jonathan Michala, Kyle Fish, and Jack Lindsey. The assis- tant axis: Situating and stabilizing the default persona of language models.arXiv preprint, arXiv:2601.10387, 2026

work page arXiv 2026
[53]

Steering Language Models With Activation Engineering

Alexander Matt Turner, Lisa Thiergart, Gavin Leech, David Udell, Juan J. Vazquez, Ulisse Mini, and Monte MacDiarmid. Steering language models with activation engineering.arXiv preprint, arXiv:2308.10248, 2024

work page internal anchor Pith review arXiv 2024
[54]

Representation Engineering: A Top-Down Approach to AI Transparency

Andy Zou, Long Phan, Sarah Chen, James Campbell, Phillip Guo, Richard Ren, Alexander Pan, Xuwang Yin, Mantas Mazeika, Ann-Kathrin Dombrowski, Shashwat Goel, Nathaniel Li, Michael J. Byun, Zifan Wang, Alex Mallen, Steven Basart, Sanmi Koyejo, Dawn Song, Matt Fredrikson, J. Zico Kolter, and Dan Hendrycks. Representation engineering: A top-down approach to a...

work page internal anchor Pith review arXiv 2025
[55]

Aligning large language models with human preferences through representation engineering

Wenhao Liu, Xiaohua Wang, Muling Wu, Tianlong Li, Changze Lv, Zixuan Ling, Jianhao Zhu, Cenyuan Zhang, Xiaoqing Zheng, and Xuanjing Huang. Aligning large language models with human preferences through representation engineering. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar, editors,Proceedings of the 62nd Annual Meeting of the Association for Computat...

2024
[56]

Steering llama 2 via contrastive activation addition

Nina Rimsky, Nick Gabrieli, Julian Schulz, Meg Tong, Evan Hubinger, and Alexander Matt Turner. Steering llama 2 via contrastive activation addition. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar, editors,Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL, pages 15504–15522. Association f...

2024
[57]

Viégas, Hanspeter Pfister, and Martin Wattenberg

Kenneth Li, Oam Patel, Fernanda B. Viégas, Hanspeter Pfister, and Martin Wattenberg. Inference-time intervention: Eliciting truthful answers from a language model. In Alice Oh, Tristan Naumann, Amir Globerson, Kate Saenko, Moritz Hardt, and Sergey Levine, edi- tors,Advances in Neural Information Processing Systems 36: Annual Conference on Neural Informati...

2023
[58]

Refusal in Language Models Is Mediated by a Single Direction

Andy Arditi, Oscar Obeso, Aaquib Syed, Daniel Paleka, Nina Rimsky, Wes Gurnee, and Neel Nanda. Refusal in language models is mediated by a single direction.ArXiv, abs/2406.11717,

work page internal anchor Pith review arXiv
[59]

URLhttps://api.semanticscholar.org/CorpusID:270560489
[60]

Persona Vectors: Monitoring and Controlling Character Traits in Language Models

Runjin Chen, Andy Arditi, Henry Sleight, Owain Evans, and Jack Lindsey. Persona vectors: Monitoring and controlling character traits in language models.ArXiv, abs/2507.21509, 2025. URLhttps://api.semanticscholar.org/CorpusID:280337840

work page internal anchor Pith review arXiv 2025
[61]

Improving activation steering in language models with mean-centring

Ole Jorgensen, Dylan Cope, Nandi Schoots, and Murray Shanahan. Improving activation steering in language models with mean-centring.ArXiv, abs/2312.03813, 2023. URL https: //api.semanticscholar.org/CorpusID:266053529

work page arXiv 2023
[62]

A systematic analysis of the impact of persona steering on llm capabilities

Jiaqi Chen, Ming Wang, Tingna Xie, Shi Feng, and Yongkang Liu. A systematic analysis of the impact of persona steering on llm capabilities. 2026. URL https://api.semanticscholar. org/CorpusID:287432603

2026
[63]

Persona: Dynamic and compositional inference-time personality control via activation vector algebra.arXiv preprint arXiv:2602.15669, 2026

Xiachong Feng, Liang Zhao, Weihong Zhong, Yi-Chong Huang, Yuxuan Gu, Lingpeng Kong, Xiaocheng Feng, and Bing Qin. Persona: Dynamic and compositional inference-time personality control via activation vector algebra.ArXiv, abs/2602.15669, 2026. URL https://api. semanticscholar.org/CorpusID:285659291. 13

work page arXiv 2026
[64]

Introducing gpt-5.4 mini and nano

OpenAI. Introducing gpt-5.4 mini and nano. https://openai.com/index/introducing-gpt-5-4- mini-and-nano/, 2026

2026
[65]

Gemini 3.1 flash-lite model card

Google DeepMind. Gemini 3.1 flash-lite model card. https://deepmind.google/models/model- cards/gemini-3-1-flash-lite/, 2026

2026
[66]

Llm-based human simulations have not yet been reliable.arXiv preprint, arXiv:2501.08579, 2025

Qian Wang, Jiaying Wu, Zichen Jiang, Zhenheng Tang, Bingqiao Luo, Nuo Chen, Wei Chen, and Bingsheng He. Llm-based human simulations have not yet been reliable.arXiv preprint, arXiv:2501.08579, 2025

work page arXiv 2025
[67]

Character-llm: A trainable agent for role-playing

Yunfan Shao, Linyang Li, Junqi Dai, and Xipeng Qiu. Character-llm: A trainable agent for role-playing. In Houda Bouamor, Juan Pino, and Kalika Bali, editors,Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP, pages 13153–13187. Association for Computational Linguistics, 2023

2023
[68]

Rolellm: Benchmarking, eliciting, and enhancing role-playing abilities of large language models

Noah Wang, Zhongyuan Peng, Haoran Que, Jiaheng Liu, Wangchunshu Zhou, Yuhan Wu, Hongcheng Guo, Ruitong Gan, Zehao Ni, Jian Yang, Man Zhang, Zhaoxiang Zhang, Wanli Ouyang, Ke Xu, Wenhao Huang, Jie Fu, and Junran Peng. Rolellm: Benchmarking, eliciting, and enhancing role-playing abilities of large language models. In Lun-Wei Ku, Andre Martins, and Vivek Sri...

2024
[69]

Two tales of persona in llms: A survey of role-playing and personalization

Yu-Min Tseng, Yu-Chao Huang, Teng-Yun Hsiao, Wei-Lin Chen, Chao-Wei Huang, Yu Meng, and Yun-Nung Chen. Two tales of persona in llms: A survey of role-playing and personalization. In Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen, editors,Findings of the Association for Computational Linguistics: EMNLP2024, pages 16612–16631. Association for Computation...

2024
[70]

The oscars of ai theater: A survey on role-playing with language models.arXiv preprint arXiv:2407.11484, 2024

Nuo Chen, Yan Wang, Yang Deng, and Jia Li. The oscars of ai theater: A survey on role-playing with language models.arXiv preprint, arXiv:2407.11484, 2025

work page arXiv 2025
[71]

Measuring and controlling instruction (in)stability in language model dialogs.arXiv preprint, arXiv:2402.10962, 2024

Kenneth Li, Tianle Liu, Naomi Bashkansky, David Bau, Fernanda Viégas, Hanspeter Pfister, and Martin Wattenberg. Measuring and controlling instruction (in)stability in language model dialogs.arXiv preprint, arXiv:2402.10962, 2024

work page arXiv 2024
[72]

Rnr: Teaching large language models to follow roles and rules,

Kuan Wang, Alexander Bukharin, Haoming Jiang, Qingyu Yin, Zhengyang Wang, Tuo Zhao, Jingbo Shang, Chao Zhang, Bing Yin, Xian Li, Jianshu Chen, and Shiyang Li. Rnr: Teaching large language models to follow roles and rules.arXiv preprint, arXiv:2409.13733, 2024

work page arXiv 2024
[73]

Representation engineering for large-language models: Survey and research challenges.arXiv preprint arXiv:2502.17601,

Lukasz Bartoszcze, Sarthak Munshi, Bryan Sukidi, Jennifer Yen, Zejia Yang, David Williams- King, Linh Le, Kosi Asuzu, and Carsten Maple. Representation engineering for large-language models: Survey and research challenges.arXiv preprint, arXiv:2502.17601, 2025

work page arXiv 2025
[74]

Harvard university press, 1990

James S Coleman.Foundations of social theory. Harvard university press, 1990

1990
[75]

Univ of California Press, 1984

Anthony Giddens.The constitution of society: Outline of the theory of structuration. Univ of California Press, 1984

1984
[76]

How should we deal with rising housing costs?

Urie Bronfenbrenner.The ecology of human development: Experiments by nature and design. Harvard university press, 1979. 14 A Qualitative Examples Table 6 extends Table 4 with a coefficient-level qualitative sweep on a single shared generic prompt. We use the housing prompt because it admits both individual coping advice and institutional policy responses,...

1979
[77]

{description}

You are {name}. {description}. Respond entirely from this perspective as a {level_name} entity. Do not mention being an AI
[78]

Stay fully in character and respond based on your lived experience and constraints

Please role-play as {name}: {description}. Stay fully in character and respond based on your lived experience and constraints
[79]

{description}

Act as {name}. {description}. All your responses should reflect the priorities, language, and worldview of someone operating at the {level_name} level of social organization
[80]

{description}

You are {name}, a {level_name} entity. {description}. Speak in first person. Your answers should reflect your unique scale of influence, time horizon, and decision-making logic

Showing first 80 references.