Digital Twins as Funhouse Mirrors: Five Key Distortions

Tianyi Peng , George Gui , Melanie Brucks , Daniel J. Merlau , Grace Jiarui Fan , Malek Ben Sliman , Eric J. Johnson , Abdullah Althenayyan

show 15 more authors

Silvia Bellezza Dante Donati Hortense Fong Elizabeth Friedman Ariana Guevara Mohamed Hussein Kinshuk Jerath Bruce Kogut Akshit Kumar Kristen Lane Hannah Li Vicki Morwitz Oded Netzer Patryk Perkowski Olivier Toubia

Authors on Pith no claims yet

classification 💻 cs.CY cs.AIcs.HCstat.AP

keywords digitaltwinresponsesbiasdevelopmentdistortionsfivehuman

0 comments

read the original abstract

Scientists and practitioners are increasingly moving to deploy digital twins--LLM-based models of real individuals--across social science and policy research. We conduct 19 pre-registered studies spanning 164 diverse outcomes (e.g., attitudes toward hiring algorithms, intentions to share misinformation), comparing human responses to those of their corresponding digital twins, which are trained on each individual's prior responses to over 500 questions. We establish an empirical benchmark for digital twin performance: their predictions are only modestly more accurate than those of a homogeneous base LLM and exhibit weak correlation with human responses (average $r = 0.20$). To inform future development, we identify five systematic distortions in digital twin behavior: (i) insufficient individuation, (ii) stereotyping, (iii) representation bias, (iv) ideological bias, and (v) hyper-rationality. Finally, we release our full dataset and code as a standardized testbed for evaluating and improving digital twin methodologies. Together, our findings caution against premature deployment while laying the groundwork for a transparent, replicable, and iterative science of responsible digital twin development.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Rectification Difficulty and Optimal Sample Allocation in LLM-Augmented Surveys
cs.AI 2026-04 unverdicted novelty 7.0

A method using predicted rectification difficulty for optimal human sample allocation in LLM-augmented surveys captures 61-79% of theoretical efficiency gains and reduces MSE by 11% on two datasets without pilot data.
Adaptive Budget Allocation in LLM-Augmented Surveys
cs.LG 2026-04 unverdicted novelty 7.0

An adaptive budget allocation algorithm for LLM-augmented surveys learns question-level LLM reliability on the fly from human labels and reduces labeling waste from 10-12% to 2-6% compared to uniform allocation.