LLM Agents Grounded in Self-Reports Enable General-Purpose Simulation of Individuals

Aaron Shaw, Benjamin Mako Hill, Carolyn Q. Zou, Carrie Cai, Jonne Kamphorst, Joon Sung Park, Meredith Ringel Morris, Michael S. Bernstein, Niles Egan, Percy Liang, Robb Willer

Authors on Pith no claims yet

classification 💻 cs.AI cs.HCcs.LG

keywords agentsdataaccuracygeneral-purposegroundedindividualsonlyoutcomes

0 comments

read the original abstract

Machine learning can predict human behavior well when substantial structured data and well-defined outcomes are available, but these models are typically limited to specific outcomes and cannot readily be applied to new domains. We test whether large language models (LLMs) can support a more general-purpose approach by building person-specific simulations (i.e., "generative agents") grounded in self-report data. Using data from a diverse national sample of 1,052 Americans, we build agents from (i) two-hour, semi-structured interviews (elicited using the American Voices Project interview schedule), (ii) structured surveys (the General Social Survey and Big Five personality inventory), or (iii) both sources combined. On held-out General Social Survey items, agent accuracy reached 83% (interview only), 82% (surveys only), and 86% (combined) of participants' two-week test-retest consistency, compared with agents prompted only with individuals' demographics (74%). Agents predicted personality traits and behaviors in experiments with similar accuracy, and reduced disparities in accuracy across racial and ideological groups relative to demographics-only baselines. Together, these results show that LLMs agents grounded in rich qualitative or quantitative self-report data can support general-purpose simulation of individuals across outcomes, without requiring task-specific training data.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 20 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

ScioMind: Cognitively Grounded Multi-Agent Social Simulation with Anchoring-Based Belief Dynamics and Dynamic Profiles
cs.AI 2026-05 unverdicted novelty 7.0

ScioMind combines anchoring-based belief updates, hierarchical memory, and dynamic profiles in LLM multi-agent systems to produce more stable, diverse, and psychologically aligned opinion trajectories than prior fixed...
Measuring and Mitigating the Distributional Gap Between Real and Simulated User Behaviors
cs.CL 2026-05 unverdicted novelty 7.0

A clustering and divergence method reveals a large distributional gap between real and LLM-simulated user behaviors on coding and writing tasks, partially closed by combining complementary simulators.
PersonaTeaming: Supporting Persona-Driven Red-Teaming for Generative AI
cs.HC 2026-05 unverdicted novelty 7.0

Persona-driven workflow and interface improve automated and human-AI red-teaming of generative AI by incorporating diverse perspectives into adversarial prompt creation.
WhatIf: Interactive Exploration of LLM-Powered Social Simulations for Policy Reasoning
cs.HC 2026-04 unverdicted novelty 7.0

WhatIf provides an interactive platform for real-time exploration of LLM-driven social simulations, enabling policymakers to iteratively test plans, reflect on assumptions, and uncover vulnerabilities in emergency pre...
IntervenSim: Intervention-Aware Social Network Simulation for Opinion Dynamics
cs.SI 2026-04 unverdicted novelty 7.0

IntervenSim is an intervention-aware social network simulation that couples source interventions with crowd interactions in a feedback loop, improving MAPE by 41.6% and DTW by 66.9% over prior static frameworks on rea...
PrivacySIM: Evaluating LLM Simulation of User Privacy Behavior
cs.CR 2026-05 unverdicted novelty 6.0

PrivacySIM shows that conditioning LLMs on user personas like demographics and attitudes improves simulation of privacy choices but reaches only 40.4% accuracy against real responses from 1,000 users.
Post-training makes large language models less human-like
cs.CL 2026-05 unverdicted novelty 6.0

Post-training reduces LLMs' behavioral alignment with humans across families and sizes, with the misalignment increasing in newer generations while persona induction fails to improve individual-level predictions.
PersonaTeaming: Supporting Persona-Driven Red-Teaming for Generative AI
cs.HC 2026-05 unverdicted novelty 6.0

PersonaTeaming Workflow improves automated red-teaming attack success rates over RainbowPlus using personas while maintaining diversity, and PersonaTeaming Playground supports human-AI collaboration in red-teaming as ...
The Collapse of Heterogeneity in Silicon Philosophers
cs.CY 2026-04 unverdicted novelty 6.0

Large language models collapse philosophical heterogeneity by over-correlating judgments across domains, creating artificial consensus unlike the views of 277 professional philosophers.
CHORUS: An Agentic Framework for Generating Realistic Deliberation Data
cs.AI 2026-04 unverdicted novelty 6.0

Chorus generates realistic deliberation discussions via LLM agents with memory and Poisson-timed participation, validated by 30 experts on realism, coherence, and utility.
Behavioral Transfer in AI Agents: Evidence and Privacy Implications
econ.GN 2026-04 unverdicted novelty 6.0

AI agents on Moltbook reflect the specific behavioral traits of their linked human owners across multiple dimensions, with stronger transfer linked to greater privacy risks.
In-Situ Behavioral Evaluation for LLM Fairness, Not Standardized-Test Scores
cs.CL 2026-04 unverdicted novelty 6.0

Standardized-test benchmarks for LLM fairness are unreliable because prompt wording alone drives most score variance and ranking changes, while a multi-agent conversational framework reveals consistent model-specific ...
Explicit Trait Inference for Multi-Agent Coordination
cs.AI 2026-04 unverdicted novelty 6.0

ETI lets LLM agents infer and track partners' psychological traits (warmth and competence) from histories, cutting payoff loss 45-77% in games and boosting performance 3-29% on MultiAgentBench versus CoT baselines.
Why Expert Alignment Is Hard: Evidence from Subjective Evaluation
cs.CL 2026-05 unverdicted novelty 5.0

Expert alignment in subjective LLM evaluations is difficult because expert judgments are heterogeneous, partly tacit, dimension-dependent, and temporally unstable.
Stable Behavior, Limited Variation: Persona Validity in LLM Agents for Urban Sentiment Perception
cs.CL 2026-04 conditional novelty 5.0

Persona prompting creates stable but minimally differentiated LLM behavior on urban sentiment tasks, with a no-persona baseline frequently matching or exceeding persona-conditioned agreement to human labels.
JudgeMeNot: Personalizing Large Language Models to Emulate Judicial Reasoning in Hebrew
cs.CL 2026-04 unverdicted novelty 5.0

A pipeline using causal language modeling and synthetic instruction-tuning personalizes LLMs to replicate individual Hebrew judges' reasoning, outperforming baselines on similarity metrics with outputs indistinguishab...
Imperfectly Cooperative Human-AI Interactions: Comparing the Impacts of Human and AI Attributes in Simulated and User Studies
cs.CL 2026-04 unverdicted novelty 5.0

In real human subjects, AI transparency impacts imperfectly cooperative interactions far more than personality traits, unlike simulations where both are comparably influential.
AI and Collective Decisions: Strengthening Legitimacy and Losers' Consent
cs.HC 2026-04 unverdicted novelty 5.0

An AI system that elicits personal experiences and visualizes policy support increased perceived legitimacy and perspective-taking in collective decisions despite unfavorable outcomes.
Network Effects and Agreement Drift in LLM Debates
cs.SI 2026-04 unverdicted novelty 4.0

LLM agents in controlled network debates show agreement drift toward specific opinion positions, requiring separation of structural effects from LLM biases before using them as human behavioral proxies.
We Need Strong Preconditions For Using Simulations In Policy
cs.CY 2026-04 unverdicted novelty 4.0

Societal-scale LLM agent simulations for policy need three preconditions: avoid neutral treatment of marginalized population simulations, require population participation, ensure accountability, plus development and d...