LLM robots match humans on engagement ratings in HRI questionnaires but systematically invert strangeness/comfort dimensions across models and live interactions.
Can LLM “self-report
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3representative citing papers
LLM self-reports predict behavior selectively: TPB reaches human-level coherence within shared conversations but collapses across sessions for primed behaviors, unlike Big 5, with persona prompting stabilizing reports but not actions.
SLM responses to psychometric prompts are dominated by prompt artifacts such as personas and option symbols rather than semantic understanding of psychological constructs.
citing papers explorer
-
When Robots Rate Their Own Interactions: Engagement Validity and the Strangeness Failure
LLM robots match humans on engagement ratings in HRI questionnaires but systematically invert strangeness/comfort dimensions across models and live interactions.
-
Rethinking Psychometric Evaluation of LLMs: When and Why Self-Reports Predict Behavior
LLM self-reports predict behavior selectively: TPB reaches human-level coherence within shared conversations but collapses across sessions for primed behaviors, unlike Big 5, with persona prompting stabilizing reports but not actions.
-
The Unsampled Truth: Psychometrics in SLMs Measure Prompt Artifacts, Not Psychological Constructs
SLM responses to psychometric prompts are dominated by prompt artifacts such as personas and option symbols rather than semantic understanding of psychological constructs.