pith. machine review for the scientific record. sign in

arxiv: 2509.19088 · v5 · submitted 2025-09-23 · 💻 cs.CY · cs.AI· cs.HC· stat.AP

Recognition: unknown

Digital Twins as Funhouse Mirrors: Five Key Distortions

Authors on Pith no claims yet
classification 💻 cs.CY cs.AIcs.HCstat.AP
keywords digitaltwinresponsesbiasdevelopmentdistortionsfivehuman
0
0 comments X
read the original abstract

Scientists and practitioners are increasingly moving to deploy digital twins--LLM-based models of real individuals--across social science and policy research. We conduct 19 pre-registered studies spanning 164 diverse outcomes (e.g., attitudes toward hiring algorithms, intentions to share misinformation), comparing human responses to those of their corresponding digital twins, which are trained on each individual's prior responses to over 500 questions. We establish an empirical benchmark for digital twin performance: their predictions are only modestly more accurate than those of a homogeneous base LLM and exhibit weak correlation with human responses (average $r = 0.20$). To inform future development, we identify five systematic distortions in digital twin behavior: (i) insufficient individuation, (ii) stereotyping, (iii) representation bias, (iv) ideological bias, and (v) hyper-rationality. Finally, we release our full dataset and code as a standardized testbed for evaluating and improving digital twin methodologies. Together, our findings caution against premature deployment while laying the groundwork for a transparent, replicable, and iterative science of responsible digital twin development.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Rectification Difficulty and Optimal Sample Allocation in LLM-Augmented Surveys

    cs.AI 2026-04 unverdicted novelty 7.0

    A method using predicted rectification difficulty for optimal human sample allocation in LLM-augmented surveys captures 61-79% of theoretical efficiency gains and reduces MSE by 11% on two datasets without pilot data.

  2. Adaptive Budget Allocation in LLM-Augmented Surveys

    cs.LG 2026-04 unverdicted novelty 7.0

    An adaptive budget allocation algorithm for LLM-augmented surveys learns question-level LLM reliability on the fly from human labels and reduces labeling waste from 10-12% to 2-6% compared to uniform allocation.