pith. sign in

arxiv: 2605.19798 · v1 · pith:KHZDVY6Unew · submitted 2026-05-19 · 💻 cs.CL

Towards Trust Calibration in Socially Interactive Agents: Investigating Gendered Multimodal Behaviors Generation with LLMs

Pith reviewed 2026-05-20 06:33 UTC · model grok-4.3

classification 💻 cs.CL
keywords socially interactive agentstrust calibrationlarge language modelsmultimodal behavior generationgender stereotypesability and benevolencetrustworthiness dimensions
0
0 comments X

The pith

Large language models can generate coherent multimodal behaviors reflecting different levels of ability and benevolence for social agents, while reproducing gender stereotypes when gender is specified.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper explores using large language models to automatically create behaviors for socially interactive agents that vary in ability and benevolence, two core dimensions of trustworthiness. This matters because calibrated trust could help users interact with agents at appropriate levels of reliance instead of over-trusting or under-using them. The authors test a prompting method to produce aligned outputs in text, intonation, facial expression, and gesture. Analysis of generated data and a user study show the behaviors match theoretical expectations and intended trait levels. The work also reveals that adding gender to prompts causes the models to link male agents with high ability and female agents with high benevolence.

Core claim

GPT-5.4 produces coherent multimodal behaviors across text, intonation, facial expression, and gesture that align with theoretical expectations for ability and benevolence. Random Forest feature importance confirms this alignment. When gender is specified in prompts, the outputs reproduce societal stereotypes, associating male agents with high ability and female agents with high benevolence. A within-subjects user study on Prolific confirms that participants perceive different levels of ability and benevolence in line with the prompt instructions.

What carries the argument

A prompt-based method for automatically generating multimodal behaviors aligned with specific levels of ability and benevolence, which produces outputs in verbal, vocal, gestural, and facial modalities.

If this is right

  • Multimodal behaviors generated this way could support trust calibration in socially interactive agents.
  • LLMs can control specific trustworthiness dimensions through targeted prompting across modalities.
  • Including gender in prompts for behavior generation leads to stereotypical associations in the outputs.
  • User perceptions of the generated behaviors match the designed levels of ability and benevolence.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Extending the method to other traits or contexts could support more varied and context-appropriate agent personalities.
  • The gender stereotype pattern points to the value of testing debiasing prompts or post-processing steps before deployment.
  • Real-world interaction tests would reveal whether these generated behaviors actually produce better-calibrated trust and usage decisions.

Load-bearing premise

The prompts to the LLMs can isolate and control the intended levels of ability and benevolence without other uncontrolled factors shaping the outputs or how people perceive them.

What would settle it

A follow-up analysis or user study in which the generated behaviors show no statistical alignment with theoretical expectations for ability and benevolence, or in which participants fail to perceive the intended trait differences.

Figures

Figures reproduced from arXiv: 2605.19798 by Chlo\'e Clavel, Lucie Galland, Magalie Ochs.

Figure 1
Figure 1. Figure 1: Methodology for the study of multimodal behaviors generated by Large Language Models [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 1
Figure 1. Figure 1: First, we present our method to generate, from LLMs, [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Most important features in random forest classification of ability and benevolence levels. Features are ranked by [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Most important features in Random Forest classification of gender for male-generated behaviors (symmetrical patterns [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
read the original abstract

As Socially Interactive Agents (SIAs) become increasingly integrated into daily life, the ability to calibrate user trust to an agent's actual capabilities would help ensure appropriate usage of these agents. In this paper, we explore the capacity of Large Language Models (LLMs) to generate multimodal behaviors (verbal, vocal, gestural, and facial expression modalities) that reflect varying levels of ability and benevolence, two key dimensions of trustworthiness. We propose a novel method for automatically generating behaviors aligned with specific levels of these traits, a first step towards enabling nuanced and trust-calibrated interactions. By analyzing a large dataset of multimodal transcripts generated by LLMs, we demonstrate that GPT-5.4 is able to produce coherent behavior across different modalities (text, intonation, facial expression, and gesture). Using Random Forest feature importance analysis, we show that the generated behaviors align with theoretical expectations for ability and benevolence. However, we also find that when gender is specified in the prompt, LLMs tend to reproduce societal gender stereotypes, associating male agents' behaviors with high ability and female agents' behaviors with high benevolence. To validate our approach, we conducted a user study on Prolific using a within-subjects design. Participants perceived different levels of ability and benevolence in the generated behaviors align with the intended instructions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 3 minor

Summary. The paper explores using LLMs (specifically GPT-5.4) to generate multimodal behaviors (text, intonation, facial expression, gesture) for socially interactive agents that reflect varying levels of ability and benevolence. It proposes a prompt-based generation method, analyzes a large dataset of outputs with Random Forest feature importance to claim alignment with trustworthiness theory, reports that gender-specified prompts reproduce societal stereotypes (male agents high-ability, female high-benevolence), and validates via a within-subjects Prolific user study that participants perceive the intended trait levels.

Significance. If the central claims hold after addressing confounds, the work offers a practical step toward automated trust calibration in SIAs and surfaces important gender-bias issues in LLM multimodal generation. The empirical pipeline combining large-scale generation, feature analysis, and human judgment data provides a replicable template, though its value hinges on demonstrating that outputs are driven by the targeted trait dimensions rather than prompt artifacts.

major comments (3)
  1. [Methods (prompt-based generation)] The prompt-based generation method (described in the methods for aligning behaviors with specific ability/benevolence levels) assumes prompts can isolate these traits independently. However, without explicit controls or ablation tests for correlated prompt wording, default model biases, or implicit cross-modal consistency rules, the Random Forest alignment may reflect prompt artifacts rather than theoretical mapping, undermining both the coherence claim and the gender-stereotype interpretation.
  2. [Results (Random Forest analysis)] In the Random Forest feature importance analysis, the reported alignment with ability and benevolence theory lacks detail on the exact multimodal features extracted, baseline comparisons (e.g., neutral or random prompts), or cross-validation metrics. This makes it impossible to assess whether importance scores genuinely track the intended dimensions or simply capture surface-level prompt elements.
  3. [User Study] The user study section reports that participant perceptions align with intended instructions but provides no sample size, statistical tests, effect sizes, or confidence intervals. These omissions prevent evaluation of whether the within-subjects design reliably validates the generation method or merely shows weak directional trends.
minor comments (3)
  1. [Abstract] The abstract contains a grammatical error: 'Participants perceived different levels of ability and benevolence in the generated behaviors align with the intended instructions' should be rephrased for clarity.
  2. The model is referred to as 'GPT-5.4'; clarify whether this is a hypothetical future model, a specific fine-tuned variant, or a typographical reference to an existing GPT-4 variant to avoid reader confusion.
  3. [Results] Figure or table captions for the Random Forest results should explicitly list the top features per modality and their importance scores to improve interpretability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback, which identifies key areas where additional rigor and transparency will strengthen our claims about LLM-generated multimodal behaviors for trust calibration in SIAs. We address each major comment below and commit to revisions that directly respond to the concerns raised.

read point-by-point responses
  1. Referee: [Methods (prompt-based generation)] The prompt-based generation method (described in the methods for aligning behaviors with specific ability/benevolence levels) assumes prompts can isolate these traits independently. However, without explicit controls or ablation tests for correlated prompt wording, default model biases, or implicit cross-modal consistency rules, the Random Forest alignment may reflect prompt artifacts rather than theoretical mapping, undermining both the coherence claim and the gender-stereotype interpretation.

    Authors: We agree that the absence of explicit ablation tests leaves open the possibility that observed alignments partly reflect prompt artifacts or model biases rather than a clean mapping to ability and benevolence. In the revised manuscript we will add a dedicated ablation subsection that (a) systematically varies prompt phrasing while holding trait levels constant, (b) compares outputs against neutral and random-prompt baselines, and (c) examines cross-modal consistency rules. These analyses will be used to qualify both the coherence results and the gender-stereotype findings, making clear which effects persist after controlling for surface-level prompt elements. revision: yes

  2. Referee: [Results (Random Forest analysis)] In the Random Forest feature importance analysis, the reported alignment with ability and benevolence theory lacks detail on the exact multimodal features extracted, baseline comparisons (e.g., neutral or random prompts), or cross-validation metrics. This makes it impossible to assess whether importance scores genuinely track the intended dimensions or simply capture surface-level prompt elements.

    Authors: We acknowledge that the current description of the Random Forest analysis is insufficiently detailed for readers to evaluate the source of the reported feature importances. The revised version will expand this section to list all extracted multimodal features (lexical, prosodic, facial, and gestural descriptors), include explicit baseline comparisons with neutral and random-prompt conditions, and report 5-fold cross-validation performance together with stability metrics for the importance rankings. These additions will allow direct assessment of whether the importance scores reflect the targeted theoretical dimensions. revision: yes

  3. Referee: [User Study] The user study section reports that participant perceptions align with intended instructions but provides no sample size, statistical tests, effect sizes, or confidence intervals. These omissions prevent evaluation of whether the within-subjects design reliably validates the generation method or merely shows weak directional trends.

    Authors: We thank the referee for noting these reporting gaps. The revised manuscript will supply the exact sample size, the statistical tests performed (repeated-measures ANOVA or paired comparisons), effect sizes, and 95% confidence intervals for the key contrasts. These additions will enable readers to judge the reliability and magnitude of the alignment between intended trait levels and participant perceptions. revision: yes

Circularity Check

0 steps flagged

No significant circularity in empirical LLM behavior generation study

full rationale

The paper is an empirical study that generates multimodal behavior transcripts via LLM prompts specifying ability/benevolence levels, analyzes them with Random Forest feature importance, and validates via a human user study on Prolific. No equations, derivations, or mathematical claims exist. No self-citation chains or ansatzes reduce any result to its own inputs by construction. The central claims rest on external human judgments and theoretical expectations rather than internal fitting or renaming, making the work self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

No free parameters or invented entities are introduced in the abstract; the work builds on established concepts in trust research and LLM capabilities.

axioms (1)
  • domain assumption Ability and benevolence are key dimensions of trustworthiness
    Invoked as the basis for generating behaviors reflecting varying levels of these traits.

pith-pipeline@v0.9.0 · 5768 in / 1374 out tokens · 70121 ms · 2026-05-20T06:33:26.084777+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

59 extracted references · 59 canonical work pages

  1. [1]

    Naeimeh Anzabi and Hiroyuki Umemuro. 2023. Effect of different listening behav- iors of social robots on perceived trust in human-robot interactions.International Journal of Social Robotics15, 6 (2023), 931–951

  2. [2]

    Marjorie Armando, Magalie Ochs, and Isabelle Régner. 2022. The impact of pedagogical agents’ gender on academic learning: A systematic review.Frontiers in Artificial Intelligence5 (2022), 862997

  3. [3]

    Agnes Axelsson and Gabriel Skantze. 2022. Multimodal user feedback during adaptive robot-human presentations.Frontiers in Computer Science3 (2022), 741148

  4. [4]

    Daniel Balliet and Paul AM Van Lange. 2013. Trust, conflict, and cooperation: a meta-analysis.Psychological bulletin139, 5 (2013), 1090

  5. [5]

    Shreyas Bhat, Joseph B Lyons, Cong Shi, and X Jessie Yang. 2024. Value alignment and trust in human-robot interaction: Insights from simulation and user study. InDiscovering the frontiers of human-robot interaction: Insights and innovations in collaboration, communication, and control. Springer, 39–63

  6. [6]

    Beatrice Biancardi, Angelo Cafaro, and Catherine Pelachaud. 2017. Analyzing first impressions of warmth and competence from observable nonverbal cues in expert-novice interactions. InProceedings of the 19th ACM international conference on multimodal interaction. 341–349

  7. [7]

    Christina Breuer, Joachim Hüffmeier, and Guido Hertel. 2016. Does trust matter more in virtual teams? A meta-analysis of trust and team effectiveness considering virtuality and documentation as moderators.Journal of Applied Psychology101, 8 (2016), 1151

  8. [8]

    Fabio Calefato, Filippo Lanubile, and Nicole Novielli. 2015. The role of social media in affective trust building in customer–supplier relationships.Electronic Commerce Research15, 4 (2015), 453–482

  9. [9]

    Maureen A Craig and Galen V Bodenhausen. 2018. Category (non) fit modu- lates extrapolative stereotyping of multiply categorizable social targets.Social Cognition36, 5 (2018), 559–588

  10. [10]

    Bart A De Jong, Kurt T Dirks, and Nicole Gillespie. 2016. Trust and team perfor- mance: A meta-analysis of main effects, moderators, and covariates.Journal of applied psychology101, 8 (2016), 1134

  11. [11]

    David DeSteno, Cynthia Breazeal, Robert H Frank, David Pizarro, Jolie Baumann, Leah Dickens, and Jin Joo Lee. 2012. Detecting the trustworthiness of novel partners in economic exchange.Psychological science23, 12 (2012), 1549–1556

  12. [12]

    Weihua Du, Yiming Yang, and Sean Welleck. 2025. Optimizing temperature for language models with multi-sample inference.arXiv preprint arXiv:2502.05234 (2025)

  13. [13]

    Wen Duan, Shiwen Zhou, Matthew J Scalia, Xiaoyun Yin, Nan Weng, Ruihao Zhang, Guo Freeman, Nathan McNeese, Jamie Gorman, and Michael Tolston

  14. [14]

    Proceedings of the ACM on Human-Computer Interaction8, CSCW2 (2024), 1–31

    Understanding the evolvement of trust over time within Human-AI teams. Proceedings of the ACM on Human-Computer Interaction8, CSCW2 (2024), 1–31

  15. [15]

    Easton, Stephen Potter, R

    K. Easton, Stephen Potter, R. Bec, M. Bennion, H. Christensen, C. Grindell, Bah- man Mirheidari, S. Weich, L. D. de Witte, D. Wolstenholme, and M. Hawley

  16. [16]

    https://api.semanticscholar.org/CorpusId:171093436

    A Virtual Agent to Support Individuals Living With Physical and Mental Comorbidities: Co-Design and Acceptability Testing.Journal of Medical Internet Research21 (2019). https://api.semanticscholar.org/CorpusId:171093436

  17. [17]

    Paul Ekman, Tim Dalgleish, and M Power. 1999. Basic emotions.San Francisco, USA1 (1999)

  18. [18]

    Siska Fitrianie, Merijn Bruijnes, Deborah Richards, Andrea Bönsch, and Willem- Paul Brinkman. 2020. The 19 unifying questionnaire constructs of artificial social agents: An iva community analysis. InProceedings of the 20th ACM International Conference on Intelligent Virtual Agents. 1–8

  19. [19]

    Lucie Galland, Catherine Pelachaud, and Florian Pecune. 2025. SMART-DREAM: To Condition or Not to Condition; A Study on the Impact of LLM Conditioning on Motivational Interview Dialog Virtual Agent. InProceedings of the 25th ACM International Conference on Intelligent Virtual Agents. 1–9

  20. [20]

    Yuan Gao, Elena Sibirtseva, Ginevra Castellano, and Danica Kragic. 2019. Fast adaptation with meta-reinforcement learning for trust modelling in human-robot interaction. In2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 305–312

  21. [21]

    Jonas Gonzalez-Billandon, Alexander M Aroyo, Alessia Tonelli, Dario Pasquali, Alessandra Sciutti, Monica Gori, Giulio Sandini, and Francesco Rea. 2019. Can a robot catch you lying? a machine learning system to detect lies during interac- tions.Frontiers in Robotics and AI6 (2019), 64

  22. [22]

    Grivokostopoulou, Konstantinos Kovas, and I

    F. Grivokostopoulou, Konstantinos Kovas, and I. Perikos. 2020. The Effectiveness of Embodied Pedagogical Agents and Their Impact on Students Learning in Virtual Worlds.Applied Sciences(2020). https://api.semanticscholar.org/CorpusId: 216241082

  23. [23]

    Rosanna E Guadagno, Jim Blascovich, Jeremy N Bailenson, and Cade McCall

  24. [24]

    Virtual humans and persuasion: The effects of agency and behavioral realism.Media Psychology10, 1 (2007), 1–22

  25. [25]

    Abhay Gupta, Arjun D’Cunha, Kamal Awasthi, and Vineeth Balasubramanian

  26. [26]

    Daisee: Towards user engagement recognition in the wild.arXiv preprint arXiv:1609.01885(2016)

  27. [27]

    Bin Han, Deuksin Kwon, Spencer Lin, Kaleen Shrestha, and Jonathan Gratch

  28. [28]

    InProceedings of the 25th ACM International Conference on Intelligent Virtual Agents

    Can LLMs Generate Behaviors for Embodied Virtual Agents Based on Personality Traits?. InProceedings of the 25th ACM International Conference on Intelligent Virtual Agents. 1–10

  29. [29]

    Craig J Johnson, Mustafa Demir, Nathan J McNeese, Jamie C Gorman, Alexandra T Wolff, and Nancy J Cooke. 2023. The impact of training on human–autonomy team communications and trust calibration.Human factors65, 7 (2023), 1554– 1570

  30. [30]

    Sai Shashank Kalakonda, Shubh Maheshwari, and Ravi Kiran Sarvadevabhatla

  31. [31]

    In2023 IEEE international conference on multimedia and expo (ICME)

    Action-gpt: Leveraging large-scale language models for improved and generalized action generation. In2023 IEEE international conference on multimedia and expo (ICME). IEEE, 31–36

  32. [32]

    Youngmin Kim, Jiwan Chung, Jisoo Kim, Sunghyun Lee, Sangkyu Lee, Junhyeok Kim, Cheoljong Yang, and Youngjae Yu. 2025. Speaking Beyond Language: A Large-Scale Multimodal Dataset for Learning Nonverbal Cues from Video- Grounded Dialogues. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2247–2265

  33. [33]

    Yanghee Kim and Quan Wei. 2011. The impact of learner attributes and learner choice in an agent-based environment.Computers & Education56, 2 (2011), 505–514

  34. [34]

    Jennifer T Kubota, Samuel A Venezia, Richa Gautam, Andrea L Wilhelm, Bradley D Mattan, and Jasmin Cloutier. 2023. Distrust as a form of inequal- ity.Scientific Reports13, 1 (2023), 9901

  35. [35]

    John D Lee and Katrina A See. 2004. Trust in automation: Designing for appro- priate reliance.Human factors46, 1 (2004), 50–80

  36. [36]

    Jin Joo Lee, Brad Knox, Jolie Baumann, Cynthia Breazeal, and David DeSteno

  37. [37]

    Computationally modeling interpersonal trust.Frontiers in psychology4 (2013), 56004

  38. [38]

    Chang Liu, Qunfen Lin, Zijiao Zeng, and Ye Pan. 2024. Emoface: Audio-driven emotional 3d face animation. In2024 IEEE Conference Virtual Reality and 3D User Interfaces (VR). IEEE, 387–397

  39. [39]

    Ziyi Liu, Zhengzhe Zhu, Lijun Zhu, Enze Jiang, Xiyun Hu, Kylie A Peppler, and K. Ramani. 2024. ClassMeta: Designing Interactive Virtual Classmate to Promote VR Classroom Participation.Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems(2024). https://api.semanticscholar.org/CorpusId: 269748691

  40. [40]

    Scott M Lundberg and Su-In Lee. 2017. A Unified Approach to Interpreting Model Predictions. Curran Associates, Inc. http://papers.nips.cc/paper/7062-a-unified- approach-to-interpreting-model-predictions.pdf

  41. [41]

    Syaheerah Lebai Lutfi, Badr Lahasan, Cristina Luna-Jiménez, Zaher A Bamasood, and Zahid Akhtar. 2023. Effects of Facial Expressions and Gestures on the Trustworthiness of a Person.IEEE Access11 (2023), 133891–133902

  42. [42]

    Roger C Mayer and James H Davis. 1999. The effect of the performance appraisal system on trust for management: A field quasi-experiment.Journal of applied psychology84, 1 (1999), 123

  43. [43]

    Roger C Mayer, James H Davis, and F David Schoorman. 1995. An integrative model of organizational trust.Academy of management review20, 3 (1995), 709–734

  44. [44]

    Luise Metzger, Linda Miller, Martin Baumann, and Johannes Kraus. 2024. Em- powering calibrated (dis-) trust in conversational agents: A user study on the persuasive power of limitation disclaimers vs. authoritative style. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems. 1–19

  45. [45]

    Jay F Nunamaker, Douglas C Derrick, Aaron C Elkins, Judee K Burgoon, and Mark W Patton. 2011. Embodied conversational agent-based kiosk for automated interviewing.Journal of Management Information Systems28, 1 (2011), 17–48

  46. [46]

    Krzysztof Opolski, Piotr Modzelewski, and Agata Kocia. 2019. Interorganizational trust and effectiveness perception in a collaborative service delivery network. Sustainability11, 19 (2019), 5217

  47. [47]

    Yaniv Oshrat, Yonatan Aumann, Tal Hollander, Oleg Maksimov, Anita Ostroumov, Natali Shechtman, and Sarit Kraus. 2022. Efficient customer service combining human operators and virtual agents.arXiv preprint arXiv:2209.05226(2022)

  48. [48]

    Abhinanda R Punnakkal, Arjun Chandrasekaran, Nikos Athanasiou, Alejandra Quiros-Ramirez, and Michael J Black. 2021. BABEL: Bodies, action and behavior with english labels. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 722–731

  49. [49]

    Andrew Reece, Gus Cooney, Peter Bull, Christine Chung, Bryn Dawson, Casey Fitzpatrick, Tamara Glazer, Dean Knox, Alex Liebscher, and Sebastian Marin. 2023. The CANDOR corpus: Insights from a large multimodal dataset of naturalistic conversation.Science advances9, 13 (2023), eadf3197

  50. [50]

    Radhika Santhanagopalan, Isobel A Heck, and Katherine D Kinzler. 2022. Leader- ship, gender, and colorism: Children in India use social category information to guide leadership cognition.Developmental Science25, 3 (2022), e13212

  51. [51]

    Su-Mae Tan and Tze Wei Liew. 2020. Designing embodied virtual agents as product specialists in a multi-product category E-commerce: The roles of source Conference’17, July 2017, Washington, DC, USA Galland et al. credibility and social presence.International Journal of Human–Computer Inter- action36, 12 (2020), 1136–1149

  52. [52]

    Qi Wu, Yubo Zhao, Yifan Wang, Xinhang Liu, Yu-Wing Tai, and Chi-Keung Tang

  53. [53]

    Motion-agent: A conversational framework for human motion generation with llms.arXiv preprint arXiv:2405.17013(2024)

  54. [54]

    Zeyi Zhang, Yanju Zhou, Heyuan Yao, Tenglong Ao, Xiaohang Zhan, and Libin Liu. 2025. Social Agent: Mastering Dyadic Nonverbal Behavior Generation via Conversational LLM Agents. InProceedings of the SIGGRAPH Asia 2025 Conference Papers. 1–12

  55. [55]

    Zheguang Zhao, Lorenzo De Stefani, Emanuel Zgraggen, Carsten Binnig, Eli Upfal, and Tim Kraska. 2017. Controlling false discoveries during interactive data exploration. InProceedings of the 2017 acm international conference on man- agement of data. 527–540

  56. [56]

    the extent to which a trustee is believed to want to do good to the trustor, aside from an egocentric profit motive

    Qingxiao Zheng, Zhuoer Chen, and Yun Huang. 2025. Learning through AI- clones: Enhancing self-perception and presentation performance.Computers in Human Behavior: Artificial Humans3 (2025), 100117. A Prompt template Role:You are a High-Fidelity Multimodal Persona Engine. You specialize in translating psychological frameworks into synchro- nized verbal and...

  57. [57]

    He sighed,

    Tag Syntax & Placement. Audio Tags:[tag] — Place immediately before or after the dia- logue segment. Focus only on vocal delivery or non-verbal vocal sounds. Facial Tags:f: expression — Place at the exact moment the facial expression should trigger. Gesture Tags:g: gesture — Place at the exact moment the physical movement should begin. Emphasis:Use CAPITA...

  58. [58]

    Approved Tag Lists.[List of approuved tags]

  59. [59]

    Gesture name

    Workflow. Analyze Personality:Read the Ability scores. Create the text:Match the text’s with the provided intention and ability score and oral style. The text is going to be read Keep the text short:3 sentences at most Apply Facial/Gesture Tags:Insert f: and g: tags where the move- ment naturally starts. Apply Audio Tags:Insert [] tags to guide the voice ...