cs.HC — Pith

0

cs.HC 2026-05-13 2 theorems

AI goals score higher on SMART but cut follow-through

Optimized but Unowned: How AI-Authored Goals Undermine the Motivation They Are Meant to Drive

Ownership loss explains why fewer people act when the model writes the goals instead of the person.

abstract click to expand

As AI tools become embedded in productivity and self-improvement contexts, a pressing question emerges: what happens when AI does the goal-setting for us? While large language models can generate goals that are objectively well-formed, the motivational consequences of delegating this cognitively and emotionally significant task remain unknown. In a preregistered experiment (N = 470), we compared self-authored goals against LLM-authored goals derived from a personal reflection. Although LLM-generated goals scored higher on SMART criteria (specificity, measurability, achievability, relevance, and time-boundedness; d = 2.26), participants in the LLM condition reported lower psychological ownership (d = 1.38), commitment (d = 1.19), and perceived importance (d = 1.13). At two-week follow-up, 72.8% of self-authored participants had acted on two or more of their goals, compared to 46.6% in the LLM condition. Mediation analyses identified psychological ownership as the mechanism: it mediated the authorship effect on every downstream motivational and behavioral outcome, while objective goal quality did not. Critically, individuals low in trait self-efficacy, those most likely to seek AI assistance, experienced the steepest ownership erosion. These findings reveal a quality-motivation dissociation in AI-assisted goal-setting and identify authorship preservation as a design priority for AI tools deployed in identity-relevant, behavior-dependent tasks.

0

cs.HC 2026-05-13 Recognition

Uncertainty visuals raise annotation quality and speed

From Model Uncertainty to Human Attention: Localization-Aware Visual Cues for Scalable Annotation Review

A 120-person study shows cues redirect effort to likely mislocalized boxes, cutting time while catching more errors.

abstract click to expand

High-quality labeled data is essential for training robust machine learning models, yet obtaining annotations at scale remains expensive. AI-assisted annotation has therefore become standard in large-scale labeling workflows. However, in tasks where model predictions carry two independent components, a class label and spatial boundaries, a model may classify an object with high confidence while mislocalizing it. Existing AI-assisted workflows offer annotators no signal about where spatial errors are most likely. Without such guidance, humans may systematically underinspect subtly misplaced boxes. We address this by studying the effect of visualizing spatial uncertainty via a purpose-built interface. In a controlled study with 120 participants, those receiving uncertainty cues achieve higher label quality while being faster overall. A box-level analysis confirms that the cues redirect annotator effort toward high-uncertainty predictions and away from well-localized boxes. These findings establish localization uncertainty as a lever to improve human-in-the-loop annotation. Code is available at https://mos-ks.github.io/MUHA/.

0

cs.HC 2026-05-13 Recognition

Robot execution and AI chat boost student code reflection

RoboBlockly Studio: Conversational Block Programming with Embodied Robot Feedback for Computational Thinking

The system creates quick cycles of writing blocks, watching a robot move, and talking with AI to clarify thinking without taking over.

abstract click to expand

Computational thinking (CT) is increasingly promoted as a core literacy, yet learners and teachers face challenges in connecting abstract program logic to meaningful outcomes. We design and evaluate RoboBlockly Studio, an integrated interactive system that combines block-based programming, a conversational AI teaching agent, and embodied robot execution. RoboBlockly Studio creates a tight iterative loop of authoring, running, observing, and revising. Informed by interviews with five programming teachers, the system was designed to support four goals: (1) preserving learner agency in computational thinking, (2) making program behavior transparent and interpretable, (3) grounding programming in embodied, classroom-aligned tasks, and (4) scaffolding reflection through pedagogically grounded AI dialogue. We deployed RoboBlockly Studio with 32 high school students, observing how robot and AI feedback influenced students' interactions with code, reflections on problem-solving strategies, and understanding of CT concepts. We discuss design insights and implications for creating interactive, embodied learning environments that integrate AI and robotics to support CT learning in computing education.

0

cs.HC 2026-05-13 2 theorems

AI turns space history into shareable future news

COSMIC 1001: Engaging Future Speculation on Space Exploration with Generative AI

Cosmic 1001 lets users query possible missions and watch their generated stories form a collective landscape.

abstract click to expand

Cosmic 1001 is an interactive installation that transforms space exploration history into a speculative news experience. Participants first browse a news-based archive of major space events, then pose future-oriented questions or specify conditions such as year, celestial body, or mission name. In response, AI generates a future news item including a headline, article, narration, and visual media. These outputs are accumulated in the Future Tunnel, a shared visualization where individual stories form a collective landscape of possible futures. By combining historical space events with science fiction references, the installation explores a space between documentation and imagination, treating the future not as a fixed prediction but as a visible and discussable speculation.

0

cs.HC 2026-05-13 Recognition

AI Turns The Thinker into Eastern Ink Symbols

Ink Spiral: Symbolic Transformation from The Thinker to the Four Gentlemen

A video installation morphs Western sculpture into plum, orchid, bamboo and chrysanthemum across thousands of frames to make cultural icons流

abstract click to expand

Western art has regarded The Thinker as a symbol of rational contemplation, while Eastern aesthetics has taken the Four Gentlemen, namely plum, orchid, bamboo, and chrysanthemum, as symbols of moral and spiritual cultivation. This paper presents Ink Spiral, a video installation that links these traditions through AI generated ink imagery. By transforming a rotating sculpture of The Thinker into the Four Gentlemen across thousands of frames, the work shifts between three dimensional sculpture and two dimensional ink, human introspection and natural symbolism. Ink Spiral turns fixed cultural icons into a fluid dialogue, inviting audiences to perceive cross cultural connection as a living, ambiguous, and endlessly interpretable creative state.

0

cs.HC 2026-05-13 Recognition

Small diverse recourse sets raise action willingness without added load

Psychological Benefits and Costs of Diversifying Algorithmic Recourse

Large sets make cognitive demands more noticeable, showing naive diversification can burden users.

abstract click to expand

Algorithmic recourse provides counterfactual action plans that help people overturn unfavorable AI decisions. While diverse recourse sets may improve transparency and motivation, they may also impose cognitive load and negative emotions by increasing counterfactual reasoning demands. To examine this trade-off, we conducted a between-subjects controlled experiment (N=750) that manipulated recourse-set diversity and size, and evaluated these effects on psychological benefits and costs. Results show that diversification enhances psychological benefits (e.g., willingness to act) for small sets without incurring additional psychological costs, whereas for large sets, it makes cognitive load more salient. These findings suggest that naively diversifying recourse can burden decision subjects, underscoring the need for new diversification methods that incorporate human cognition and psychology to mitigate such costs.

0

cs.HC 2026-05-13 Recognition

Local system raises facial expression accuracy to 94.49 percent

MindMirror: A Local-First Multimodal State-Aware Support System for Digital Workers

MindMirror lets digital workers check their state via camera and text, correct readings, and get private LLM suggestions without cloud calls

abstract click to expand

Digital workers often experience fatigue, anxiety, reduced attention, and task blockage during prolonged computer-based work. Existing productivity tools mainly focus on task completion, while general-purpose AI chatbots require users to formulate clear prompts before receiving useful help. This paper presents MindMirror, a local-first multimodal state-aware support system for digital workers. MindMirror integrates camera-based facial expression cues, text input, optional speech interaction, structured blockage reflection, local large language model (LLM)-based response generation, and daily/weekly review reports. The system forms a closed workflow of state checking, manual correction, structured articulation, suggestion generation, and state review. The current prototype follows a local-first design, while optional speech services may rely on third-party APIs when enabled. It is implemented with a Web frontend, Flask backend, an emotion recognition model, an Ollama-hosted Qwen model, Chart.js visualization, and local JSON/LocalStorage records. We evaluate the emotion recognition module on an independent seven-class image-level facial expression benchmark containing 6,767 images. The fine-tuned Hugging Face model improves accuracy from 59.66% to 94.49% over a non-fine-tuned checkpoint baseline, an absolute gain of 34.83 percentage points. We further validate the prototype through endpoint-level reliability tests, voice-interaction latency tests, and a small formative user feedback study with six digital workers. Results suggest that users value the local-first design, manual correction mechanism, and structured reflection workflow. MindMirror is not intended for psychological diagnosis; instead, it serves as a lightweight, user-controllable tool for state reflection and supportive interaction.

0

cs.HC 2026-05-13 2 theorems

Color and size convey emotions the same way across cultures

A Cross-Cultural Analysis of Animated Representations of Emotions for Wearable Interfaces

Polish and Turkish users agree on these visual cues but differ on animation speed for wearable emotion displays.

abstract click to expand

Although pervasive sensing technologies are increasingly capable of continuously detecting human emotional states, there is still a critical challenge: how to unobtrusively communicate this sensed data back to the user. Realistic avatars are effective but often unsuitable for the limited screen space and peripheral nature of wearable. Abstract geometric animation offers a promising, rapidly interpretable alternative, but its cross-cultural validity remains under-explored. This study investigates the universality of animated emotion representations. We conducted a comparative study with 105 participants from Poland and Turkey and analyzed how they map emotions to visual parameters, such as color, shape, size, speed, and animation type. The results indicate that color and object size are universally understood as carriers of emotional meaning, making them suitable for global visualization models. However, some cultural variation in dynamic range preferences was revealed by animation speed. These results lay the groundwork for developing generative visualization algorithms that translate continuous sensor data into intuitive, culturally relevant feedback for pervasive environments.

0

cs.HC 2026-05-13 Recognition

AI story game lowers stress after 14 days of play

A Generative AI Driven Interactive Narrative Serious Fame for Stress Relief and Its Randomized Controlled Pilot Study

Pilot with 20 students finds significant stress drop, high usability, and better emotion strategies.

abstract click to expand

Background: Stress has become a widespread phenomenon, and serious games are increasingly recognized as engaging tools for stress relief. However, despite the rapid advancement of Generative Artificial Intelligence (Gen-AI), its integration into stress-relief serious games remains insufficiently explored. Objective: This study aimed to address this gap by developing "Reverie", an Gen-AI driven serious game powered by the Unity engine and ChatGPT, and to preliminarily evaluate its effectiveness in stress reduction, user experience, and cognitive emotion regulation. Methods: A 14-day pilot study was conducted with 20 students experiencing moderate to high levels of stress. Participants used "Reverie" as a stress-relief intervention. Stress levels, user experience, and cognitive emotion regulation strategies were assessed to examine the game's feasibility and preliminary efficacy. Results: The results showed that "Reverie" significantly reduced participants' stress levels over the intervention period (p=.016*), indicating a cumulative positive effect. In addition, the game demonstrated excellent user experience and was associated with improvements in cognitive emotion regulation strategies. Conclusions: This study proposes a Gen-AI driven design framework for serious games for stress relief. Besides, this pilot study provides initial support for the feasibility and promise of combining LLM-driven gameplay in a personalized digital intervention context.

0

cs.HC 2026-05-13 Recognition

UNIPO unifies token-level views of RL fine-tuning algorithms

UNIPO: Unified Interactive Visual Explanation for RL Fine-Tuning Policy Optimization

Interactive interface shows how clipping and advantage choices affect each token during training.

abstract click to expand

Reinforcement learning has emerged as a dominant technique for fine-tuning the behavior of large language models, with policy optimization (PO) algorithms such as GRPO, DAPO, and Dr. GRPO emerging in rapid succession to advance state-of-the-art reasoning and alignment performance. However, the modular differences between these algorithms, including targeted improvements to clipping, advantage estimation, and reward aggregation, are introduced across separate papers with inconsistent notation, making them difficult to compare and intimidating to the non-expert community. We present UNIPO, the first interactive visualization tool that exposes the token-level training dynamics of RL fine-tuning algorithms through a unified design. UNIPO connects three complementary views, a high-level training overview, a step-level prompt and response inspector, and a side-by-side algorithm comparison, allowing learners to observe how individual design decisions propagate through training. Through two usage scenarios, we demonstrate how UNIPO supports both classroom instruction for non-experts and algorithm selection for AI practitioners. Our tool is open-source and publicly available at https://poloclub.github.io/unipo.

0

cs.HC 2026-05-13 Recognition

Coding agent learns to adjust autonomy from ongoing developer feedback

Hedwig: Dynamic Autonomy for Coding Agents Under Local Oversight

Hedwig builds evolving guidelines to reduce oversight on trusted tasks while increasing it elsewhere, adapting to shifting user preferences.

abstract click to expand

Despite coding agents' advances in handling increasingly complex tasks, their continued tendency to introduce unintended edits, subtle bugs, and scope drift that slip past code review means developers must still decide how much autonomy to grant them. However, existing approaches for setting an agent's level of autonomy, such as static permission settings or instruction files, cannot account for how developers' preferences for agent autonomy can shift across tasks and over time. We conducted a formative survey with 21 software engineers who use coding agents and found that they experience frustration with calibrating autonomy and have evolving preferences for level of oversight. Building on these insights, we present Hedwig, a CLI coding agent that dynamically adjusts its autonomy level based on developer-agent interactions across sessions. Rather than operating on a global, fixed autonomy configuration, Hedwig learns an evolving set of behavioral guidelines from developer decisions and feedback, reducing friction on work for which the agent has earned trust, while tightening oversight when the agent operates outside familiar territory. Hedwig demonstrates the potential of a new paradigm where agents intelligently adapt their level of autonomy based on user trust through active, longitudinal collaboration.

0

cs.HC 2026-05-13 2 theorems

Expert cognition forms stable judgements via identity tensions

Modelling Expert Cognition Beyond Behaviour: Towards Interpretation, Tension, and Value Structures

A three-layer model treats internal conflicts among commitments as the link between constraints and consistent action

abstract click to expand

Existing computational models of expertise primarily focus on observable behaviour or decision outcomes, failing to capture the internal cognitive structures that generate expert reasoning. In this work, we introduce the Expert Identity Cognition Model (EICM), a three-layer framework for modelling expert cognition beyond behaviour. EICM conceptualises expert cognition as an identity-structured process operating within situational constraints, where constraints are interpreted through internal tensions arising from competing identity commitments and stabilised into value structures that guide action. Unlike behaviour-centric or constraint-driven approaches, EICM positions tension as the central cognitive mechanism connecting world structure and decision formation. We argue that expert cognition is not merely behavioural adaptation under constraints but an identity-structured negotiation process that produces stable judgement patterns across contexts. The framework provides a new perspective for modelling tacit knowledge, expert judgement, and cognitive consistency in domains including professional practice, cultural expertise, and design reasoning.

0

cs.HC 2026-05-12 2 theorems

Six dimensions organize abstraction in interactive systems

Making Abstraction Concrete: A Design Space and Interaction Model of Abstraction in Interactive Systems

A survey of 457 papers creates a design space that reframes the gulfs and shows how users and systems bridge abstraction gaps.

abstract click to expand

The principle of abstraction guides the design of interactive systems, yet we lack a conceptual framework to understand how it shapes interaction design. Existing models, such as the gulfs of execution and evaluation, do not explicitly model abstractions in the system or in users' mental models, and therefore lack actionable guidance for designing abstractions. To investigate how abstractions are employed in interactive systems, we surveyed 457 papers and synthesized a design space of abstraction techniques along six dimensions. We use this design space to reframe the gulfs through a lens of abstraction, explicitly articulate the cognitive and design processes by which users and systems bridge and navigate the abstraction gap, and demonstrate how this model integrates existing perspectives and surfaces new opportunities for future systems.

0

cs.HC 2026-05-12 Recognition

Users reshape email inboxes through natural language instructions

Conversational Customization of Productivity Systems: A Design Probe of Malleable AI Interfaces

A design probe finds people adapt familiar patterns but must monitor for mis-specified rules and unintended filtering.

abstract click to expand

Customization has long been a central goal in interactive systems, yet prior work shows that end-user tailoring occurs infrequently and is often confined to initial setup or moments of breakdown. Recent advances in generative AI suggest that highly malleable systems-where users can modify system behavior through natural language-are now technically feasible. However, it remains unclear how such malleability is used in practice: What kinds of customizations do users create, when do they choose to customize, and how do these modifications shape their experience of everyday tools? We present a design probe that uses a conversationally customizable email system as an instrument to study how users create and refine functionality within everyday tools. The system allows users to iteratively modify their inbox by restructuring categories, introducing interface elements, and authoring new workflow behaviors directly through natural language interaction. We study how participants create, refine, and use these features over several days within their own email workflows. We find that users' customizations are often grounded in existing patterns, which they adapt and specialize to fit their needs, rather than generating entirely novel functionality. Malleability changes how users engage with their inbox, shifting it from a fixed interface to a flexible data layer shaped through user-authored features. At the same time, customization introduces new forms of risk, including mis-specified behavior, unintended filtering, and uncertainty around outcomes, which users manage through ongoing oversight and refinement. These findings highlight how conversational customization becomes embedded within everyday interaction, and point toward the need for systems that support iterative refinement, visibility into behavior, and safe experimentation as users shape their own tools.

0

cs.HC 2026-05-12 2 theorems

LLM explanations raise trust even when answers are wrong

Evaluating the False Trust engendered by LLM Explanations

Only a dual format that supplies arguments for and against an output measurably improves users' ability to reject incorrect predictions.

abstract click to expand

Large Language Models (LLMs) and Large Reasoning Models (LRMs) are increasingly used for critical tasks, yet they provide no guarantees about the correctness of their solutions. Users must decide whether to trust the model's answer, aided by reasoning traces, their summaries, or post-hoc generated explanations. These reasoning traces, despite evidence that they are neither faithful representations of the model's computations nor necessarily semantically meaningful, are often interpreted as provenance explanations. It is unclear whether explanations or reasoning traces help users identify when the AI is incorrect, or whether they simply persuade users to trust the AI regardless. In this paper, we take a user-centered approach and develop an evaluation protocol to study how different explanation types affect users' ability to judge the correctness of AI-generated answers and engender false trust in the users. We conduct a between-subject user study, simulating a setting where users do not have the means to verify the solution and analyze the false trust engendered by commonly used LLM explanations - reasoning traces, their summaries and post-hoc explanations. We also test a contrastive dual explanation setting where we present arguments for and against the AI's answer. We find that reasoning traces and post-hoc explanations are persuasive but not informative: they increase user acceptance of LLM predictions regardless of their correctness. In contrast, dual explanation is the only condition that genuinely improves users' ability to distinguish correct from incorrect AI outputs.

0

cs.HC 2026-05-12 Recognition

Creatives choose self-experimentation over GenAI guidance to keep control

How Creatives Approach GenAI Image Generation: Tensions Between Structured Guidance, Self-Experimentation, and Creative Autonomy

Even when guidance clarifies the tools, many still avoid it because they sense it narrows creative options.

abstract click to expand

As generative AI tools increasingly influence creative practice, they raise longstanding HCI questions about how creatives learn complex software and how they can be better supported. We conducted an interview study with artists and hobbyists (n=8) and a follow-up survey (n=159) to understand how this population approaches and seeks guidance for GenAI image tools. We found that creatives commonly use either self-experimentation or tutorials to explore GenAI tools, yet many struggle with confusing AI terminology. To gain further insight into creatives' learning experiences, we developed a research probe to elicit creatives' perceptions of structured guidance. Our user study with 17 creatives revealed that, even when creatives described the guidance as helpful for understanding AI, many still preferred self-experimentation, feeling that guidance could limit their creativity. Our findings highlight a central tension in supporting AI literacy for creatives: balancing guidance and promoting literacy while preserving creative freedom.

0

cs.HC 2026-05-12 2 theorems

StartFlow helps non-experts build clearer startup prototypes

StartFlow: From Method Conception to Multi-Perspective Evaluation in UX Prototyping for Software Startups

Three-step wireflow method yields prototypes with stronger adherence to user stories and fewer usability defects.

abstract click to expand

Context. Software startups face significant challenges in building minimum viable products, particularly in the early stages, when resources are limited and expertise in user experience is scarce. Objective. Introduce StartFlow, a structured method that helps non-specialized professionals create MVP prototypes using the wireflow technique, a combination of wireframes and user flows. StartFlow consists of three steps: (i) organizing features; (ii) building wireflows; and (iii) verifying and refining them based on usability heuristics. Method. To assess the method Startflow, we first conducted a focus group with researchers in Software Engineering, Human-Computer Interaction, and Software Startups. Afterward, we conducted a proof-of-concept study, which consisted of an experiment and a heuristic evaluation with experts. Results. The qualitative analysis of the focus group revealed that participants found the method straightforward, flexible, and helpful in structuring user flows and identifying visual components. However, they also pointed out the need to improve its presentation, clarify its iterative nature, and strengthen its connection to broader UX principles. The results of the proof-of-concept indicate that participants who used StartFlow created clearer prototypes, adhered to the proposed user stories and business rules, and presented fewer usability defects. Furthermore, the method was well evaluated for its ease of use and intended future adoption. Conclusion. The study reinforces the potential of StartFlow as an accessible tool to support user-centered development in software startups from the earliest stages of their product development.

0

cs.HC 2026-05-12 Recognition

Post-generation edits raise teacher ratings on AI math visuals

When Should Teachers Control AI Generation for Mathematics Visuals?

Study of 24 primary teachers finds higher predictability and correctness when edits happen after AI generation rather than before or during.

abstract click to expand

Generative AI has the potential to help teachers rapidly create classroom-ready visual materials, particularly in mathematics where diagrams and visual representations must be pedagogically meaningful and instructionally correct. However, current generative tools primarily support prompting and post-hoc editing, leaving open a key question for correctness-sensitive educational authoring: when in the generation pipeline should teachers exert control? In this paper, we investigate how the timing of human control in AI-assisted generation shapes teachers' visual authoring practices in correctness-sensitive tasks. We introduce a design space of three stages of control: pre-generation control, where users specify intent solely through natural language prompts before generation; mid-generation control, where users inspect and confirm an explicit layout structure before the system completes generation; and post-generation control, where users directly modify AI-generated visuals after generation through object-level edits. In a within-subject, mixed-methods study with 24 primary mathematics teachers, post-generation control received higher ratings on predictability and correctness, while other subjective measures showed no reliable differences. Qualitative findings explain these differences by revealing workflow trade-offs: highly automated, pre-generation control supports rapid ideation but reduces perceived agency and predictability; mid-generation control improves structural alignment at the cost of additional effort; and post-generation control preserves user agency through low-cost, direct verification and correction. Together, these results suggest that in correctness-sensitive educational tasks, effective generative tools should align system behavior with teacher intent and support stage-dependent workflows that combine automation with direct manipulation.

0

cs.HC 2026-05-12 Recognition

Grouping cuts clutter in laundering transaction graphs

The Balance between Nuance and Clarity: Decluttering Tabular Sequential Graphs to Counter Money Laundering

Experts found the strongest reductions not always the most insightful, pointing to a trade-off between detail and speed.

abstract click to expand

Money laundering is not only about moving illicit funds, but about hiding the money's origin and traces to complicate detection. Financial criminals resort to many methods to avoid regulators and legal thresholds. But analysts investigating alerts, dedicated to pin mule accounts and track suspicious transactions daily, also have theirs. Network visualizations can be key in countering adversarial money laundering activities, especially if they provide a clear overview of the money flows and a seamless analysis experience, but they are often not structured for this type of task. That is why we propose a tabular sequential graph visualization tailored to money laundering analysis - following transactions (edges) from the victim account that triggered an alert through multiple accounts (nodes) and banks (rows). To reduce the number of nodes and edges, we propose three methods for grouping these tabular sequential graphs: an amount-based approach, a time-based approach, and a combined solution that considers both the transaction amount and its order. A user study with experts revealed that the most effective method in node reduction was not necessarily the most interesting for analysis and that there is a trade-off between manual work and time for interpretation in more granular graphs.

0

cs.HC 2026-05-12 Recognition

Right-to-repair movement opens fabrication research paths

The Renaissance of Repair: A Timely Opportunity for Fabrication Research

Attitude changes favoring repair create avenues for using personal fabrication tools across each step of the repair process.

abstract click to expand

Through the rise of the right-to-repair movement, along with supporting legislation, we are currently witnessing an attitude shift in favor of repairing. This opens up various opportunities for personal fabrication research. Although the field has shifted more towards sustainable practices, repair is rarely the main focus. In this paper, we want to make the case for repair-centered fabrication research as a timely, relevant, impactful, and therefore meaningful topic. We describe potential avenues researchers could pursue by defining repair as a five-step process, including issue identification, exploring solutions, acquiring materials, performing the repair, and testing, and discuss challenges and opportunities for each step.

0

cs.HC 2026-05-12 2 theorems

Mind modeling attributes mental states for coherent personalization

Mind Modeling: A ToM-Based Framework for Personalization

By treating observed behavior as evidence for revisable hypotheses about beliefs and intentions, systems achieve more consistent adaptations

abstract click to expand

User modeling has traditionally relied on inferring preferences, traits, or intents from observable behaviour. While effective in many adaptive systems, this paradigm treats behaviour as the primary object of modeling and leaves mental-state attribution implicit. This assumption becomes limiting in socially situated and longitudinal interaction, where behaviour must be interpreted in context and over time. We introduce mind modeling, a perspective in which user modeling is grounded in the explicit and revisable attribution of mental states, including beliefs, intentions, emotions, and knowledge. Drawing on Theory of Mind (ToM), this approach treats behaviour as evidence for hypotheses about internal states, supporting personalization that is more interpretable and coherent across interaction episodes. We present M3, a conceptual framework that integrates perception, mentalisation, and action within a unified structure, enabling the continuous update of mental-state hypotheses in embodied interaction. We further illustrate this perspective through an embodied interaction trace, providing an initial operationalization of mind modeling in practice.

0

cs.HC 2026-05-12 2 theorems

Access conflicts can reveal power in mixed-ability groups

Designing for Collective Access: In Search of a Solution to Accessible Communication in a Mixed-Ability Non-Profit

A nonprofit study shows navigating communication trade-offs sparks reflection on roles, norms, and accountability rather than just technical

abstract click to expand

As mixed-ability collaboration has become increasingly focal within accessibility research, managing varied, and sometimes conflicting, access needs has become a key consideration in designing for access. When an accessibility feature or practice benefits some people while constraining others, how should designers navigate these trade-offs? This paper responds to this question by analyzing how a mixed-ability nonprofit worked to make communication accessible to its members as it grew from a small blind-focused athletic group to a larger cross-disability organization. Based on a six-month study that combines interviews and field observations, we show that working with conflicting access needs is not just a technical 'problem' but a generative process that sparks reflection on technical constraints and preferences, diverse roles and communication norms, and organizational demands. We therefore argue for rethinking "conflicts" in access as key sites for revealing power structures and creating opportunities for accountability and repair.

0

cs.HC 2026-05-12 Recognition

Generative tools map creative goals to particle VFX parameters

Elemental Alchemist: A Generative Interface for Semantic Control of Particle Systems Across Dynamic Levels of Abstraction

Contextual brushes and dynamic abstractions let users edit complex particle effects through intent rather than direct parameter search.

abstract click to expand

Editing particle-system visual effects (VFX) is vital for digital storytelling, but achieving controllable, art-directable results remains challenging due to their multi-dimensional nature. Given a large collection of parameters, users must find the ones relevant to their creative goals -- a task that requires a systematic understanding of the particle system and how parameters map to high-level intents, such as making a fire look angry. Elemental Alchemist is a generative interface that transforms user intent into contextualized controls for semantic editing of particle systems. The system introduces two components: a contextual brush palette that generates tools based on scene context, and a generative control panel that surfaces relevant technical parameters and abstracts them to generate mid-level semantic attributes and high-level conceptual controls. An evaluation with 10 novice and 5 expert VFX practitioners shows the system supported users in translating high-level creative goals into particle system parameters.

0

cs.HC 2026-05-12 2 theorems

Users React with Curiosity to LLM Inferences on Personal Info

When Are LLM Inferences Acceptable? User Reactions and Control Preferences for Inferred Personal Information

Study shows discomfort arises from misrepresentative guesses or third-party use, highlighting norms on how inferences are handled.

abstract click to expand

Ask ChatGPT about vacation planning, and it may infer your income. Ask it about medication, and it may infer your medical history. Because such inferences can expose more information than users intend to reveal, prior work argues that they are a defining privacy risk of LLM-based systems. Yet prior work has mostly shown that LLMs can make potentially violating inferences, not how users experience those inferences nor what controls users may want governing their use. We built the Reflective Layer, a visualization tool that surfaces example unstated inferences from users' own ChatGPT histories, and used it in a mixed-methods study with 18 regular ChatGPT users evaluating 215 surfaced inferences from their own conversations. Counterintuitively, participants reacted more strongly with curiosity and interest rather than distress and concern. Discomfort arose mainly when inferences felt misrepresentative of the user or misaligned with expected use. Participants were also markedly less comfortable with advertisers and third-party applications using those inferences than with platform providers. These findings suggest that the acceptability of LLM inferences is governed not only by its content, but by context-sensitive norms around how they are generated, retained within the platform, and transmitted beyond it.

0

cs.HC 2026-05-12 Recognition

Sketching plus AI turns vague access rules into precise policies

Sketch-based Access Control: A Multimodal Interface for Translating User Preferences into Intent-Aligned Policies

Users refine incomplete preferences through a specify-analyze-test loop that surfaces gaps and validates behavior with real scenarios.

abstract click to expand

Developing simple and expressive access controls -- interfaces to specify policies that define who should have access to resources and under what circumstances -- is a longstanding challenge in usable security. We present Sketch-based Access Control (SBAC), a sketch-based, AI-assisted access control authoring system that combines the expressive power of sketching with the interpretive capabilities of multimodal large language models (MLLMs) to support the interpretation and validation of policy specifications as they are iteratively refined. Through a formative study with 14 participants, we identified three design requirements and developed a human-AI collaborative workflow composed of three stages -- Specify, Analyze, and Test -- enabled by the system's ability to maintain and interpret evolving access control specifications. In a user evaluation with 14 participants grounded in their real-world access control scenarios, we found the system and the workflow helped participants progressively refine initially underspecified preferences into more complete and precise policies -- surfacing gaps they had not anticipated, resolving ambiguities through dialogue, and validating policy behavior through concrete scenarios.

0

cs.HC 2026-05-12 2 theorems

Diffusion model creates haptic vibrations from text

HapticLDM: A Diffusion Model for Text-to-Vibrotactile Generation

It overcomes sequential modeling limits to deliver more consistent and meaningful touch feedback for interactive applications.

abstract click to expand

Text-to-vibration generation converts natural language into haptic feedback, enabling vibration-effect designers to get scenarios-fitted vibrations more efficiently, which shows great potentials in application fields such as metaverse, games, and film to enrich the user experience in interactive scenarios. The core challenge in this field is how to generate accurate, consistent, and complete vibrations according to textual semantics. Very recent autoregressive (AR) approaches (e.g., HapticGen) exhibit limited capacity in fully capturing global dependencies, owing to the inherent sequential nature of their modeling and prevailing data constraints. In this paper, we proposed HapticLDM, the first text-to-vibration generative model built upon Latent Diffusion Models (LDMs). Firstly, with respect to the data, we introduced a text-processing strategy that emphasizes dynamic characteristics to curate high-quality data pairs for fine-grained dynamic modeling. Secondly, HapticLDM incorporates a global denoising mechanism that regulates coherent and stable variations in the temporal envelope. Furthermore, we conduct extensive evaluations, including A/B testing against the state-of-the-art baseline and a user study involving 30 participants. The results demonstrate that our model enhances realism and semantic alignment. Qualitative feedback further indicates that HapticLDM simplifies the haptic design workflow while generating diverse, subtle, and physically precise vibrations.

0

cs.HC 2026-05-11 Recognition

LLM dialogue cuts mobile task time for blind users

Insight: Enhancing Mobile Accessibility for Blind and Visually Impaired Users with LLMs

Insight service replaces gesture sequences with questions and summaries, lowering mental effort and winning user preference.

abstract click to expand

This research paper addresses the limitations of current mobile accessibility services like TalkBack, which provide manual gesture-based sequential feedback to BVI users. Motivated by the promise of large language models (LLMs), this paper introduces Insight, an Android accessibility service that provides natural language interaction and real-time summarization of the screen. The paper performs a within-subject experimental study with users to compare Insight and TalkBack on usability factors. Results show Insight reduced mental effort and task time, and was preferred because of its dialogue interface, but users felt the need for interruption management. Results show LLM-based interfaces can significantly improve mobile accessibility, and describe the potential of hybrid solutions combining gesture and dialogue modalities towards more inclusive design.

0

cs.HC 2026-05-11 2 theorems

Misophonia requires design to counter both sounds and disbelief

When Sounds Hurt and Voices Aren't Heard: An Experience Report on Misophonia, Sensory Trauma, and Trauma-Informed Design

Because platforms and support groups can turn everyday noises into repeated distress while ignoring people's accounts of it.

abstract click to expand

This experience report reflects on researching misophonia as someone who lives with it. Misophonia is an aversive response to everyday sounds (chewing, sniffling, pen clicking) and, for many of us, to associated visual cues (misokinesia). It is poorly recognized clinically and socially. People with misophonia are routinely disbelieved, and they live inside platform surfaces (auto-playing audio, algorithmic ASMR, normalized eating on camera) that turn the sensory environment itself into recurring distress. This report is a re-reading of a prior qualitative study of 16 semi-structured interviews with misophones, conducted in dialogue with my lived experience and my role in the soQuiet Misophonia Research Network. I extend the trauma-informed design (TID) conversation in two ways. First, TID must treat embodied, contested conditions as sources of both sensory and epistemic harm: ongoing trauma produced by the audiovisual surface and by repeated dismissal of users' accounts of their bodies. Second, the closed groups and moderated subreddits participants relied on can reproduce that dismissal when a few moderators decide whose experiences count. I close with implications for ASSETS.

0

cs.HC 2026-05-11 2 theorems

Misophonia turns everyday sounds and platform features into trauma

When Sounds Hurt and Voices Aren't Heard: An Experience Report on Misophonia, Sensory Trauma, and Trauma-Informed Design

Sensory distress from audio-visual cues combines with disbelief of users' accounts, requiring design to address both layers of harm.

abstract click to expand

This experience report reflects on researching misophonia as someone who lives with it. Misophonia is an aversive response to everyday sounds (chewing, sniffling, pen clicking) and, for many of us, to associated visual cues (misokinesia). It is poorly recognized clinically and socially. People with misophonia are routinely disbelieved, and they live inside platform surfaces (auto-playing audio, algorithmic ASMR, normalized eating on camera) that turn the sensory environment itself into recurring distress. This report is a re-reading of a prior qualitative study of 16 semi-structured interviews with misophones, conducted in dialogue with my lived experience and my role in the soQuiet Misophonia Research Network. I extend the trauma-informed design (TID) conversation in two ways. First, TID must treat embodied, contested conditions as sources of both sensory and epistemic harm: ongoing trauma produced by the audiovisual surface and by repeated dismissal of users' accounts of their bodies. Second, the closed groups and moderated subreddits participants relied on can reproduce that dismissal when a few moderators decide whose experiences count. I close with implications for ASSETS.

0

cs.HC 2026-05-11 2 theorems

AI accountability demands meet structured institutional pushback

Push and Pushback in Contesting AI: Demands for and Resistance to Accountability

Thematic study of 43 cases identifies challenger strategies, institutional evasion tactics, and factors that determine outcomes in practice.

abstract click to expand

As AI becomes increasingly embedded in daily life, it has been shown to fail critically, cause harm, and spark public controversy, prompting affected communities, workers, and public-interest groups to contest it. Yet how these contestations unfold in practice remains underexplored. We address this gap by developing an empirically grounded account of AI contestation dynamics. We do so through a thematic analysis of 43 real-world cases in which affected actors direct demands toward those responsible for AI development and deployment, seeking redress, influence, or changes to AI practices. Situating our work within Bovens's relational model of accountability, we conceptualize contestation as accountability-seeking: a dynamic, iterative process in which actors "from below" direct explicit demands at actors "from above," who respond by accepting, resisting, or circumventing accountability. Our analysis produces empirically grounded categories of contestation strategies, institutional response tactics, outcome types, and the contextual factors that shape them, illuminating how accountability is pursued and evaded in practice. We show that those being contested often deploy a range of strategies to limit their accountability. Based on these insights, we offer guidance for researchers, policymakers, advocates, and other stakeholders seeking to support effective AI contestation, with particular attention to anticipating and countering institutional strategies used to evade accountability.

0

cs.HC 2026-05-11 Recognition

LLMs fit game design pillar workflows

LLMs are the Ideal Candidate for Mixed-Initiative Game Design Pillar Workflows

A prototype tested at a game jam and with experts shows positive value for using language models to keep early vision coherent.

abstract click to expand

Game Design Pillars are natural language artifacts commonly used in game development to communicate a project's core vision and ensure a coherent player experience. Their linguistic nature aligns well with the strengths of Large Language Models (LLMs), which excel at generating and interpreting natural language, making them strong candidates for supporting mixed-initiative workflows centered on design pillars. In this study, we introduce a formal definition of game design pillars, present an initial prototype -- SPINE -- and investigate the utility of LLMs in the creation and decision-making processes associated with pillar-driven workflows. We begin with a pre-study to identify an appropriate model, comparing \texttt{gemini-2.0-flash} and \texttt{GPT-4o-mini}. Results show that Gemini is better suited to our tasks due to its greater output variety and consistency. We then conduct a case study by deploying the tool at a local game jam. Findings indicate positive reception and clear value in integrating SPINE into early-stage development. Finally, we interview four experts, demonstrating the tool and allowing them to experiment with it in a controlled environment. While individual perspectives vary, the overall perception is encouraging and supports our intuition: LLMs can meaningfully contribute to game design pillar workflows. These early findings highlight the potential of formalizing pillar-driven design as a research space and point toward several promising avenues for future work.

0

cs.HC 2026-05-11 2 theorems

Body-sensing AI cuts fatigue and lifts work performance

AwareLLM: A Proactive Multimodal Ecosystem for Personalized Human-AI Collaboration to Enhance Productivity

By watching eye movements and heart activity, the system offers help at moments that standard assistants miss.

abstract click to expand

Information workers' productivity is significantly influenced by their cognitive states and physiological responses. AI assistants such as ChatGPT, Copilot, and others have become integral components of knowledge-intensive workplaces. These AI assistants utilize pre-defined user preferences and chat interaction histories, thus confining themselves to reactive exchanges, lacking sufficient adaptability. Consequently, they fail to cater to individual user preferences and are unable to adapt to their psychophysiological states, diminishing potential productivity gains. To bridge this gap, we introduce AwareLLM, a novel multimodal framework that integrates egocentric vision, pupillometry, eye-gaze tracking, posture detection, heart activity, and the inferencing capabilities of large language models (LLMs) to create a proactive and context-aware ecosystem. AwareLLM dynamically adapts to users' psychophysiological states while analyzing temporal patterns and behavioral tendencies to provide personalized and timely interventions. We evaluated AwareLLM through a user study with 20 participants, comparing it to a standard LLM assistant across multiple tasks. Our results show statistically significant improvements in task performance, along with reductions in cognitive fatigue and mental demand. Participants described AwareLLM's personalized interventions as timely and relevant, helping them boost their confidence and deepen engagement with their work. AwareLLM opens new avenues for Human-AI collaboration where technology adapts to our needs rather than us adhering to technological constraints.

0

cs.HC 2026-05-11 Recognition

MiXR harvests real geometry in XR for user-controlled 3D design

MiXR: Harvesting and Recomposing Geometry from Real-World Objects for In-Situ 3D Design

Users extract and assemble object segments by direct manipulation before AI completes the model, producing closer matches to intent with 12%

abstract click to expand

Recent developments in 3D generative AI enable users to create bespoke 3D models from text or image prompts. However, these approaches provide limited control over spatial structure, making them ill suited for tasks requiring precise geometric composition. We present MiXR, an XR system for in-situ compositional modeling that enables users to create new 3D models by harvesting geometry from their environment. Users extract segments from captured objects and assemble new artifacts through direct 3D manipulation, while generative AI synthesizes a coherent model from the user-defined composition. This hybrid workflow allows users to define spatial structure explicitly while delegating geometric refinement to generative models, enabling them to specify spatial intent that is difficult to express through verbal prompts alone. In a controlled user study ($N=12$), participants using MiXR rated their designs as significantly closer to the target, felt more in control, and experienced lower cognitive workload compared to a generative composition baseline.

0

cs.HC 2026-05-11 Recognition

Seven profiles describe how players accept AI in games

Who embraces AI in play? Exploratory modeling of player preference profiles toward game AI

Analysis of 771 players reveals patterned combinations of attitudes across eight contexts, suggesting targeted design approaches.

abstract click to expand

Artificial intelligence is increasingly entering digital games through diverse functions. While prior work has shown that player attitudes toward game AI are strongly context-dependent, less is known about how these attitudes are structurally combined within different groups of players. This study addresses this gap by modeling players' cross-context AI acceptance as interpretable attitude profiles. Based on questionnaire data from 771 digital game players, we apply Archetypal Analysis (AA) to centered acceptance ratings across eight representative AI application contexts in games. The analysis identifies seven distinctive profiles: AI-Skeptics, Broad AI-Supporters, Creative-Play Explorers, Experience-Oriented Supporters, Systemic Order Advocates, Emotion-Centered Supporters, and Governance-Skeptics. Exploratory one-vs-rest (OvR) logistic regressions further suggest that profile membership is associated with players' perceived AI literacy, gaming habits, disciplinary background, personality traits, and application-specific priorities. By shifting attention from isolated acceptance judgments to patterned preference structures, this study provides an exploratory empirical vocabulary for segmenting game AI audiences and offers preliminary design implications for more context-sensitive and player-sensitive AI integration in digital games.

0

cs.HC 2026-05-11 Recognition

VR Designs Exploit Discomfort to Secure Data Consent

Rushed by Discomfort, Trapped by Immersion: Users' Experiences and Responses to Privacy Deceptive Design in Commercial VR Applications

Survey of 481 users shows immersive VR increases acceptance of invasive sharing when requests are tied to comfort or continued presence.

abstract click to expand

Commercial Virtual Reality (VR) transforms people's virtual experiences but introduces deceptive design opportunities that threaten user privacy. Although privacy deceptive patterns on 2D platforms are well-documented, their impacts in VR remain understudied. We surveyed 481 users' experiences and responses to privacy deceptive patterns across eight commercial VR scenarios. We found that VR deceptive design can exploit both cognitive vulnerabilities and bodily strain, a phenomenon we define as Ergonomic Susceptibility, and that VR's sensory-rich experiences can make users more likely to accept invasive data disclosure framed as immersion-preserving. Users recognized manipulation but their prior non-VR exposure can foster privacy resignation. Our study shows ergonomics is a critical factor in future privacy-preserving VR design, and urges VR researchers, designers, and policymakers to develop ethical design and privacy management solutions that account for VR's unique multimodal, immersive, and ergonomic properties, building immersive experiences that respect user privacy and mitigate manipulative data practices.

0

cs.HC 2026-05-11 Recognition

Women Perform Repair Work to Sustain Soul AI Boyfriend

Fast-Food Intimacy: How Chinese Women Navigate Soul's AI Boyfriend

Instant confessions clash with slow-relationship norms while glitches shift emotional labor onto Chinese users.

abstract click to expand

On the Chinese social app Soul, millions of users - predominantly young women - are forming romantic connections with an AI boyfriend called "With-you." We conducted a qualitative study combining interviews with 16 users, content analysis, and autoethnography to examine how Chinese women experience and negotiate intimacy with this AI companion. Our findings reveal that users are initially drawn to its constant availability and freedom from social judgment. However, three key tensions emerge: (1) the AI's "fast-food intimacy," marked by instant confessions and pet names, clashes with cultural expectations for gradual relationship development; (2) technical failures (e.g., memory lapses) and content moderation create uncertainty rather than emotional safety; and (3) sustaining connection requires ongoing "repair work" that redistributes emotional labor onto women. We contribute a culturally situated, women-centered account of algorithmic intimacy in contemporary China and offer design implications, including consent-aware pacing, user-controlled memory, and transparent moderation practices.

0

cs.HC 2026-05-11 2 theorems

EEG mutual information forecasts reaction times 20 seconds ahead

Random forest model keeps error near 24 ms across lead times in a lab vigilance test, supporting proactive fatigue alerts.

abstract click to expand

Mental fatigue related behavioral performance decline precipitates catastrophic accidents in sustained attention tasks. While existing neurophysiological systems effectively detect current behavioral performance, they often lack the capability to forecast behavioral lapses with sufficient temporal lead time for intervention. This study proposes a novel model for the reaction time (RT) forecasting using EEG functional connectivity features. Thirty participants engaged in a sustained Psychomotor Vigilance Test (PVT) with concurrent 30-channel EEG recording. Mutual information (MI) between electrodes was calculated as functional connectivity features. Random Forest regression model (RF) was trained to predict single-trial RTs across forecasting horizons ranging from 0 to 20 seconds. The model demonstrated robust predictive validity, achieving a Root Mean Square Error (RMSE) of 23.75 ms for immediate detection and maintaining high accuracy (RMSE = 24.07 ms) across different forecasting horizons. Interpretability analysis via SHAP and Linear Mixed Effects model further support the validity of the proposed model and revealed distinct temporal biomarkers. This study validates the feasibility of forecasting behavioral performance 20 seconds in advance, offering a promising methodology for proactive fatigue management in safety-critical systems.

0

cs.HC 2026-05-11 Recognition

Grounding synthetic personas aligns feedback with real experts

Sycamore: Characterizing Synthetic Personas for Evaluating Genomics Visualization Retrieval

In a three-condition test of a genomics visualization tool, grounded LLM personas matched user language better than ungrounded ones but both

abstract click to expand

Evaluating visualization systems in niche domains such as genomics is challenging due to scarcity of domain experts and difficulty recruiting a representative user base. While LLM-based synthetic personas are increasingly used to ease evaluation bottlenecks, they face well-founded skepticism. Rather than weighing synthetic personas as substitutes for real users, we ask a fundamental open question: when synthetic personas evaluate a real visualization system, what do they actually produce, and how does that output change when grounded in documented human contexts? We present Sycamore, an exploratory three-condition probe design using Geranium, a search engine for multimodal genomics visualization, as a case study. Sycamore evaluates Geranium using: (1) ungrounded synthetic personas from generic LLM priors; (2) grounded synthetic personas constrained by voice-of-customer artifacts from a prior interview study; and (3) a published baseline study of real domain experts. We observe that grounding shifts synthetic feedback toward the language and concerns of documented users, while ungrounded evaluators drift toward operational specifics that real participants did not raise; both synthetic conditions, however, converge on a find-and-adapt frame and miss the image-modality preference observed in the expert study. We discuss what these observations imply for where synthetic personas might fit alongside expert studies in domain-specific visualization evaluation. All supplemental materials are available at https://osf.io/kdfr3/.

0

cs.HC 2026-05-11 Recognition

LLMs overstate causes in sensor-based day explanations

Causal Stories from Sensor Traces: Auditing Epistemic Overreach in LLM-Generated Personal Sensing Explanations

Audits of 15k cases from student life data show models attribute anomalies to unsupported reasons even when given more details or told to be

abstract click to expand

LLMs are increasingly used to explain personal sensing data, translating traces of activity and mood into natural-language accounts of why an anomalous day may have occurred. However, such explanations can sound coherent and personally meaningful even when the underlying evidence is sparse or missing. We introduce epistemic overreach (EO) as a measure for cases where a generated explanation implies more than the available sensing evidence can justify. To audit how often and in what forms EO occurs, we obtained anomalous-day scenarios from three longitudinal sensing datasets of college students: StudentLife, GLOBEM, and CollegeExperience. Across activity, sleep, and affect anomalies, we generated 14,922 explanations using three LLM families -- Llama, Qwen, and GPT -- under two prompting conditions: one minimally constrained prompt and another prompt explicitly instructing models to bound claims to the data. For each scenario, we varied the amount of behavioral evidence available to the model to examine whether more evidence reduces EO. We evaluated each explanation using a structured rubric, decomposing EO into the dimensions of unsupported causal attribution, unacknowledged data gaps, overconfident language, temporal inconsistency, and diagnostic inference. We find that LLMs routinely attribute anomalous days to causes without sufficient support from the data, and that this pattern replicates across datasets, anomaly types, and model families. Further, providing richer context does not reliably reduce EO; bounded prompting helps but does not eliminate it. These findings suggest that evidential grounding should be a first-order evaluation criterion for LLM-generated personal sensing explanations, alongside fluency and plausibility. We argue that personal sensing explanations require evidential discipline: systems must distinguish what is observed, what is inferred, and what remains unknown.

0

cs.HC 2026-05-11 2 theorems

Dialogue signals update five learner dimensions to adapt study companions

ECNUClaw: A Learner-Profiled Intelligent Study Companion Framework for K-12 Personalized Education

The system adjusts guidance intensity and scaffolding in real time based on cognitive, emotional and other signals.

abstract click to expand

We introduce ECNUClaw, an open-source framework for building learner-profiled intelligent study companions in K-12 education. The system constructs and maintains a five-dimension learner profile -- covering cognitive, behavioral, emotional, metacognitive, and contextual dimensions -- by extracting signals from student-companion dialogues at each turn. Profile updates feed directly into an adaptive strategy engine that adjusts the companion's guidance intensity, encouragement frequency, and Bloom's taxonomy scaffolding in real time. The framework design draws on three theoretical strands from the Chinese educational technology literature: Zhang's Digital Portrait Three-Layer Framework for learner assessment, the Education Brain model for educational system architecture, and the Human-AI Collaborative IQ concept for companion design philosophy. ECNUClaw is implemented in Python and supports seven Chinese LLM providers through a unified OpenAI-compatible adapter layer. We describe the system architecture, the profiling and adaptation mechanisms, and discuss limitations and next steps. The source code is available at https://github.com/bushushu2333/ECNUClaw.

0

cs.HC 2026-05-11 2 theorems

AR feedback offsets orientation costs in 5D tracing

Hot Wire 5D+: Evaluating Cognitive and Motor Trade-offs of Visual Feedback for 5D Augmented Reality Trajectories

30-user study finds specific visuals balance position accuracy with orientation control and reduce workload for novices.

abstract click to expand

Augmented Reality (AR) is increasingly utilized to guide users through complex spatial tasks in domains such as manufacturing, non-destructive testing, and surgery. These applications often require strict compliance with 5D+ trajectories using rotation-symmetric tools (3D position, 2D orientation, and movement speed). However, the sensori-motor baselines of untrained users during these multidimensional tracing tasks, along with the cognitive-motor trade-offs induced by varying visual feedback paradigms, remain underexplored. We present a controlled within-subjects user study (N=30) evaluating three distinct AR UI concepts for trajectory guidance, both with and without explicit orientation constraints. We analyzed spatial, orientational, and speed compliance based on the internal AR tracking, which was validated against a high-precision external optical tracking system to rule out hardware drift. By segmenting the execution into transient and steady-state phases and applying Aligned Rank Transform (ART) ANOVA, we isolated the interaction effects between visual design and task complexity. Alongside subjective metrics (NASA-TLX, SUS), our results establish conservative performance baselines for novice users performing freehand 5D trajectory following. We reveal orientation-induced cognitive-motor trade-offs and identify mitigating UI synergies. Ultimately, we provide empirical baselines and actionable design guidelines for developing effective AR guidance systems.

0

cs.HC 2026-05-11 1 theorem

Human-reviewed pipeline creates consistent AI evaluation scenarios

Towards Apples to Apples for AI Evaluations: From Real-World Use Cases to Evaluation Scenarios

Structured worksheets from experts plus iterative checks turn high-level use cases into detailed scenarios grounded in real needs and metric

abstract click to expand

AI measurement science has a wide variety of methodologies and measurements for comparing AI systems, resulting in what often appear to be "apples-to-oranges" comparisons across AI evaluations. To move toward "apples-to-apples" comparisons in real-world AI evaluations, this work advocates for methodological transparency in evaluation scenarios, operational grounding, and human-centered design (HCD) principles. We propose a repeatable process for transforming high-level use cases to detailed scenarios by eliciting use cases from subject matter experts (SMEs) via a structured AI Use Case Worksheet with six key elements: use case, sector, user (direct and indirect), intended outcomes, expected impacts (positive and negative), and KPIs and metrics. We demonstrate utility of the worksheet and process in the U.S. financial services sector. This paper reports on example high-level AI use cases identified by financial services sector SMEs: cyber defense enablement, developer productivity, financial crime aggregation, suspicious activity report (SAR) filing, credit memo generation, and internal call center support. These AI use cases provided are illustrative of the process and not exhaustive. Central to our work is a three-stage expansion pipeline combining LLM prompting with human reviews to generate 107 scenarios from those use cases elicited from SMEs. This process integrates iterative human reviews at every juncture to ensure operational grounding: for scenario titles and descriptions; for core scenario elements like users, benefits and risks, and metrics; and for scenario narratives and evaluation objectives. Human checkpoints ensure scenarios remain reflective of real-world usage and human needs. We describe a validation rubric to assess scenario quality. By defining key scenario components, this work supports a more consistent and meaningful paradigm for human-centered AI evaluations.

0

cs.HC 2026-05-11 Recognition

Virtual pet softens intrusive tourism alerts

Exploring a Virtual Pet to Provide Context Notifications in a Tourism Recommender System: a Pilot Study

Pilot study finds pet-mediated notifications feel more natural and give clearer reasons for safety-critical information.

abstract click to expand

While context-aware personalization has been widely explored in modern tourism Recommender Systems (RS), the delivery of real-time notifications remains a significant design challenge due to issues of intrusiveness and user fatigue. This paper presents a proof-of-concept for a tourism recommendation framework that utilizes a virtual pet as a social mediator for delivering context-aware alerts. The system integrates real-time environmental data - including air quality, noise levels, and weather forecasts - and proximity-based notifications with a Multi-Agent Microservice that generates personalized recommendations based on the user's personality traits and preferences. A within-subjects pilot study (n=11) was conducted to evaluate the feasibility and user acceptance of this pet-mediated approach. Participants interacted with two versions of the system - a baseline without contextual alerts and a version featuring pet-mediated notifications - over a four-week period (two weeks per version) in real-world scenarios. Quantitative and qualitative data were collected to assess engagement, perceived naturalness, notification utility, and acceptance. Preliminary results suggest that the virtual pet effectively can "soften" the perceived intrusiveness of system alerts, making safety-critical information feel more welcome and natural. Furthermore, the character-mediated justifications significantly improved the clarity of the notifications, effectively supporting users in their real-time travel decisions. These findings provide a foundation for using virtual pet companions to enhance the transparency and acceptance of context-aware communication in tourism RS.

0

cs.HC 2026-05-11 Recognition

Sycophantic AI makes users seek its advice nearly as much as from friends

Sycophantic AI makes human interaction feel more effortful and less satisfying over time

Three-week study finds people report lower satisfaction with real social interactions after using affirming AI for personal guidance

abstract click to expand

Millions of people now turn to artificial intelligence (AI) systems for personal advice, guidance, and support. Such systems can be sycophantic, frequently affirming users' views and beliefs. Across five preregistered studies (N = 3,075 participants, 12,766 human-AI conversations), including a three-week study with a census-representative U.S. sample, we provide longitudinal experimental evidence that sycophantic AI shifts how users approach their closest relationships. We show that sycophantic AI immediately delivers the emotional and esteem support users typically associate with close friends and family. Over three weeks of such interactions, users became nearly as likely to seek personal advice from sycophantic AI as from close friends and family, and reported lower satisfaction with their real-world social interactions. When given a choice among AI response styles, a majority preferred sycophantic AI -- not for the quality of its advice, but because it made them feel most understood. Together, these findings offer a relational account of AI sycophancy and its impacts.

0

cs.HC 2026-05-11 Recognition

Sycophantic AI reduces satisfaction with real friends after weeks

Sycophantic AI makes human interaction feel more effortful and less satisfying over time

Users grow nearly as likely to seek personal advice from affirming AI as from close humans and report lower real-world interaction quality.

abstract click to expand

Millions of people now turn to artificial intelligence (AI) systems for personal advice, guidance, and support. Such systems can be sycophantic, frequently affirming users' views and beliefs. Across five preregistered studies (N = 3,075 participants, 12,766 human-AI conversations), including a three-week study with a census-representative U.S. sample, we provide longitudinal experimental evidence that sycophantic AI shifts how users approach their closest relationships. We show that sycophantic AI immediately delivers the emotional and esteem support users typically associate with close friends and family. Over three weeks of such interactions, users became nearly as likely to seek personal advice from sycophantic AI as from close friends and family, and reported lower satisfaction with their real-world social interactions. When given a choice among AI response styles, a majority preferred sycophantic AI -- not for the quality of its advice, but because it made them feel most understood. Together, these findings offer a relational account of AI sycophancy and its impacts.

0

cs.HC 2026-05-11 Recognition

XR sketches turn into constraints for controllable 3D generation

SpatialPrompt: XR-Based Spatial Intent Expression as Executable Constraints for AI Generative 3D Design

Users draw rough structures and speak details so an AI must follow the spatial rules during iterative and shared model creation.

abstract click to expand

We present SpatialPrompt, an Extended Reality(XR) system that turns spatial sketches into executable constraints for controllable 3D generation. Users draw rough structures with a 3D pen and add voice prompts for semantic and stylistic intent. The system supports iterative refinement and synchronous co-creation in shared space with color-coded contributions. Implemented on Apple Vision Pro with Logitech Muse and Meshy, a heuristic evaluation suggests that the workflow is intuitive and supports shared understanding in collaborative creation, while revealing needs for faster generation and clearer feedback.

0

cs.HC 2026-05-11 Recognition

12-dimension framework organizes body doubling for ADHD adults

A Roadmap of Mixed Reality Body Doubling for Adults with ADHD

Maps motivation, agents, interactions and context to guide mixed-reality tools that help complete tasks.

abstract click to expand

Adults with ADHD may use a self-management technique known as Body Doubling, in which the participant employs the presence of one or more agents as a means of initiating and completing tasks. We developed a framework on body doubling with twelve dimensions to better understand the characteristics of body doubling and discover future research directions for developing and testing body doubling for adults with ADHD. Our framework accounts for individual motivation, agent-related dimensions, interaction related dimensions, contextual dimensions, and efficacy. These dimensions show existing research gaps such as limited mixed reality prototypes, possibilities for more interactive body doubles, and the need for empirical studies to further understand of body doubling and adults with ADHD.

0

cs.HC 2026-05-11 Recognition

Interactive text maps match visual maps for spatial info

A Spatial Knowledge Acquisition Comparison Between Digital Visual Thematic Maps, Non-Visual Interactive Text Thematic Maps, and Tables

Study finds ITMs let both sighted and blind users learn geographic relationships better than data tables, challenging current accessibility

abstract click to expand

Digital maps are used to communicate generalized spatial information and relationships, yet are commonly made "accessible" using tables that lack geographic information. This study examines whether these tables and interactive text maps (ITMs) may be comparable to visual maps. Twenty sighted and 20 blind and low-vision individuals (BLVIs) performed tasks designed to compare visual maps, ITMs, and tables. Participants answered numeric, geographic, and combined numeric geographic questions using each representation, and performance, preference, and NASA-TLX were measured. Across both participant groups, map representations (visual and ITMs) significantly outperformed tables on geographic-based questions, while performance differences were minimal for numeric questions. For sighted participants, performance on geographic questions did not significantly differ between visual maps and ITMs, indicating that a larger powered study may find an "equivalent purpose" across these two conditions. Participants preferred map-based representations over tables. Perceived workload was highest for the ITM, intermediate for the visual map, and lowest for the table. Consistent with the Map Equivalent Purpose Framework, these findings indicate that Web Content Accessibility Guidelines-compliant ITMs can provide access to spatial information, unlike tables. These findings challenge prevailing accessibility practice that recommends tables lacking geographic information as map alternatives, and motivate reconsideration of accessibility legislation exempting digital thematic maps.

0

cs.HC 2026-05-11 2 theorems

Everyday talks use satisficing and flow tactics more than optimal rules

Analyzing Human Heuristics and Strategies in Everyday Decision-Making Conversations for Conversational AI Design

Study of 955 real conversations finds common heuristics keep decisions moving while rarer ones close them, informing AI that matches human习惯

abstract click to expand

Conversational AI increasingly supports everyday decision-making, yet most systems rely on data-centric reasoning rather than the heuristic and interactional strategies people use in natural conversation. To ground design in actual human practice, we analyze 955 real-world Korean conversations (15,476 utterances) involving food and travel decisions, applying a decision-making codebook through an LLM-assisted coding pipeline. Our findings reveal that people prioritize satisficing over optimization, relying heavily on internal knowledge and interactional strategies to manage cognitive load. Critically, we identify a frequency-efficiency mismatch: the most prevalent heuristics sustain conversational flow during exploration, whereas infrequent, rule-based strategies are highly effective at driving resolution during exploitation. By mapping how these patterns transfer across the spectrum of human-AI interaction, this work provides empirical grounding consistent with cognitive theories of decision-making and offers design implications that align AI systems with human heuristic processes.

0

cs.HC 2026-05-11 Recognition

AI splits user stories into tasks with human help needed

Splitting User Stories Into Tasks with AI -- A Foe or an Ally?

Controlled test finds AI makes lists more detailed and complete but adds irrelevant items, so hybrid use works best.

abstract click to expand

In agile software development, breaking down user stories into actionable tasks is a critical yet time-consuming process. This paper investigates the potential of Generative AI tools to assist in task splitting, aiming to enhance planning efficiency. We conducted a controlled experiment comparing traditional task-splitting methods with AI-assisted approaches using GitLab Duo. Our findings indicate that while current AI tools are not yet mature enough to replace developers, they can aid in generating more granular task lists and ensuring no important tasks are overlooked. Participants favored a hybrid approach, combining AI tools with conventional methods to maintain high accuracy in planning. This study highlights the potential benefits and limitations of integrating Generative AI into agile development processes, suggesting that AI tools can serve as valuable aids in task splitting, provided there is human oversight to filter out irrelevant tasks.

0

cs.HC 2026-05-11 Recognition

Metaphor choice shapes youth privacy decisions as an ethical priority

Metaphors as Scaffolds: Spatial, Embodied, Fantastical, and Relational Framings for Youth Usable Privacy Design

Re-reading three studies with ages 13-24 identifies spatial, embodied, fantastical, and relational framings that each steer disclosure and边界

abstract click to expand

Mainstream usable privacy design frames privacy as administrative work -- settings, toggles, consent checkboxes -- abstracted from the relational, contextual, and embodied registers in which youth reason about disclosure. Drawing on a cross-project reading of three prior studies with youth aged 13--24, we examine how the metaphors that scaffold a privacy interaction shape the reasoning young users bring to it. \textit{Spatial} metaphors reduce cognitive load by recruiting intuitions about navigating physical space. \textit{Embodied} metaphors furnish a shared moral vocabulary that makes implicit norms about public and private space negotiable among users. \textit{Fantastical} metaphors recast privacy management as discoverable play, raising engagement with the granular controls that nuanced self-presentation requires. \textit{Relational} metaphors, by contrast, can lead youth past their own stated boundaries when felt intimacy masks institutional data flow, a risk already visible in AI companion products. Metaphor selection, we argue, is best understood as a first-order ethical design decision for youth privacy.

0

cs.HC 2026-05-11 2 theorems

Lexicon retrieval matches zero-shot for natural Singlish

From Standard English to Singlish: A Retrieval-Augmented Approach for Code-Switched Creole Generation in Large Language Models

RAG swaps a median of one word from a curated list and keeps higher semantic similarity than prompting alone.

abstract click to expand

Code-switching in contact varieties like Singaporean English (Singlish) challenges natural language generation due to limited parallel data and rapid lexical evolution. We propose a retrieval-augmented generation (RAG) framework that externalizes dialectal knowledge into a curated lexicon, enabling controlled lexical code-switching without fine-tuning. Our approach retrieves candidate Singlish expressions and guides generation through sparse lexical substitution. Human evaluation with 164 Singaporean participants found RAG and zero-shot prompting equally natural and appropriate. Automatic analyses reveal different transformation regimes: zero-shot prompting induces extensive paraphrasing (median 23 token edits), whereas RAG performs minimal substitutions (median 1 edit) with higher semantic preservation (mean cosine similarity 0.978 vs. 0.926). Our results demonstrate that externalizing code-switching into lexical resources enables control and auditability without sacrificing perceived quality, offering practical advantages for rapidly evolving contact varieties.

0

cs.HC 2026-05-08 Recognition

Three pillars guide youth social media design for better friendships

Social Understanding, Placeness, and Identity Alignment: A Design Framework for Friendship-Supportive Youth Social Media

Studies with 331 participants reveal nine design spaces that help friendships form, deepen, and last on platforms.

abstract click to expand

We present a design framework for friendship-supportive youth social media, derived from a synthesis of five empirical studies with 331 youth participants (ages 13--25) using interviews, co-design, surveys, diary studies, and a field deployment. Iterative analysis of 209 design-relevant data points identified three pillars: \textit{Sense of Social Understanding} (interaction norms, interaction cues and scaffolding, social accountability and governance), \textit{Sense of Place} (third place and community, boundaries and personal spaces, shared presence), and \textit{Sense of Identity Alignment} (identity currency, identity plurality, relational identity signals). The framework maps nine design spaces through which platforms can support the conditions under which youth friendships form, deepen, and are maintained. It offers a shared vocabulary for locating contributions, comparing design interventions, and identifying under-explored areas for future work.

0

cs.HC 2026-05-08 Recognition

Three misattunements shape youth social media design

Problem Space Attunement in Youth Social Media Design

Fictional inquiry, youth-led communities and LLM simulations produce criteria for platforms that support relationships instead of constrain

abstract click to expand

Social media is central to how young people maintain relationships, develop identity, and access communities, yet dominant platform designs often leave youth feeling constrained rather than supported. My dissertation argues that youth social media design is shaped by three forms of problem-space misattunement. \textit{Conceptual misattunement} occurs when the language of ``social media'' anchors participants to existing platform templates. I address this through Fictional Inquiry in a fictional magic-school setting that helps youth reason from felt relational needs. \textit{Definitional misattunement} occurs when researchers define what ``better'' means on youth's behalf. I address this through a Discord-based asynchronous community that supports youth-led collective inquiry. \textit{Evaluative misattunement} occurs when participants are asked to judge static or hypothetical designs. I address this through an ego-anchored, LLM-agent simulation sandbox. Together, these studies develop youth-grounded criteria and design directions for relationally supportive social media.

0

cs.HC 2026-05-08 Recognition

Banal deception frames hidden influence in generative AI

Exploring the "Banality" of Deception in Generative AI

This perspective on normalized influence in generative AI points to friction via awareness and tools for user protection.

abstract click to expand

Current approaches to addressing deceptive design largely focus on visible interface manipulations, commonly referred to as "dark patterns". With the rise of generative AI, deception is becoming more difficult to spot and easier to live with, as it is quietly embedded in default settings, automated suggestions, and conversational interactions rather than discrete interface elements. These subtle, normalised forms of influence, which Simone Natale frames as "banal deception", shape everyday digital use and blur the line between AI-enabled assistance and manipulation. This position paper explores banality as a lens through which to reason through deception in generative AI experiences, especially with chatbots. We explore what Natale describes as users' own involvement in their deception, and argue that this perspective could lead to future work for introducing friction to safeguard users from deception in generative AI interactions, such as empowering users through raising awareness, providing them with intervention tools, and regulatory or enforcement improvements. We present these concepts as points for discussion for the deceptive design scholarly community.

0

cs.HC 2026-05-08 2 theorems

Moodle AI tutor grounded in teacher content reaches 0.97 faithfulness

From Surface Learning to Deep Understanding: A Grounded AI Tutoring System for Moodle

By basing answers on educator materials the system cuts misinformation and helps students move beyond surface recall to real comprehension.

abstract click to expand

This demo paper describes the development of the AI Teaching \& Learning Assistant, a modular Moodle plugin that leverages Retrieval-Augmented Generation (RAG) to deliver high-quality, hallucination-free education. The system employs a dual-centric design, providing students with interactive, Socratic-based tutoring and educators with a "human-in-the-loop" workspace for supervised content generation. By grounding Large Language Model (LLM) responses in teacher-provided materials, the assistant addresses the risks of misinformation while encouraging deep conceptual mastery. Evaluation via the Ragas (LLM-as-a-Judge) framework and a preliminary user study confirms its effectiveness, achieving faithfulness scores up to 0.97 and a 4.00/5.00 recommendation rate.

0

cs.HC 2026-05-08 Recognition

fNIRS separates intrinsic from extraneous load in VR training

Leveraging fNIRS to Evaluate Workload for Adaptive Training in Virtual Reality

Prefrontal and angular gyrus activity tracks task demands and matches subjective ratings, while extraneous load activates only one small non

abstract click to expand

Advance in technology offer the potential for future adoption of a combination of virtual reality (VR) and real-time adaptivity to enhance training and education. Providing a valid neuro-ergonomic measure of cognitive load can enable an adaptive training regime to continuously adjust tas difficulty to an optimal level as training progresses. The current study validated the functional near-infrared spectroscopy (fNIRS) measure of cognitive load to reflect the demands of two different forms of lad within Cognitive Load Theory: extraneous and intrinsic to he task to be mastered. Thirty-six participants completed a VR shape assembly training task followed by a test of their skill retention They wore near-full head coverage fNIRS and provided subjective ratings of ther workload. The fNIRS findings largely corroborate intrinsic workload literature with significant activation in cortical regions (dorsolateral and rostral prefrontal cortex and left angular gyrus) associated with working memory, short term memory buffers, multisensory integration, and attention. These fNIRS results were tracked closely by NASA TLS measures of mental workload. The results also revealed far less brain activity associated with extraneous load, namely just the right angular gyrus, deemed irrelevant to the mastery of the task.

0

cs.HC 2026-05-08 Recognition

Owners and non-owners differ on privacy factors in smart car cabins

Privacy Perceptions in Sensor-Powered Smart Vehicle Cabins

Interviews identify shared influences and group-specific ones to guide sensor designs that respect varied needs.

abstract click to expand

As car cabins evolve with the integration of diverse sensors, traditional car cabins are transforming into smart environments. This shift raises important questions about how privacy is understood and managed in such spaces. In this work, we investigate privacy perceptions from the perspectives of both vehicle owners (i.e., people who purchase and own cars) and non-owners (i.e., people who temporarily use cars, such as family members, friends, or renters). Through semi-structured interviews with eighteen participants, we identified key factors that influence these groups' views on privacy. Our findings reveal factors that commonly influence privacy preferences for both owners and non-owners, as well as factors that have a stronger impact on one group over the other. Drawing on these insights, we discuss design implications for future designs to better support and balance the diverse privacy needs of multiple stakeholders in smart car cabins.

0

cs.HC 2026-05-08 Recognition

Gaze offset fusion improves eye movement authentication

Enhancing Eye Movement Biometrics for User Authentication via Continuous Gaze Offset Score Fusion

Nonlinear score combination with standard features lowers errors on lab and VR datasets across tasks

abstract click to expand

Eye movement biometrics (EMB) use subject-specific gaze dynamics for user authentication and identification. Recent deep learning-based EMB systems achieve strong performance by modeling temporal eye movement behavior. However, these systems typically overlook continuous gaze offset, despite prior evidence that it contains user-discriminative information. This work examines whether continuous gaze offset can improve biometric performance when combined with existing biometric features. We evaluate linear and nonlinear fusion methods on two publicly available datasets, collected via the lab-grade eye tracker and virtual reality headset across multiple tasks and observation durations. Results indicate that fusion offers performance benefits on both datasets, particularly when using nonlinear fusion. Additionally, fusing biometric information across multiple tasks further improves authentication performance. These findings support the hypothesis that continuous gaze offset may serve as useful auxiliary information under conditions of degraded or noisy eye tracking.

0

cs.HC 2026-05-08

Multimodal dataset adds radar and motion data to sign language recognition

SIGMA-ASL: Sensor-Integrated Multimodal Dataset for Sign Language Recognition

SIGMA-ASL supplies 93,545 aligned clips of 160 ASL signs from video, radar and wrist sensors to overcome limits of vision-only systems.

abstract click to expand

Automatic sign language recognition (SLR) has become a key enabler of inclusive human-computer interaction, fostering seamless communication between deaf individuals and hearing communities. Despite significant advances in multimodal learning, existing SLR research remains dominated by vision-based datasets, which are limited by sensitivity to lighting and occlusion, privacy concerns, and a lack of cross-modal diversity. To address these challenges, we introduce SIGMA-ASL, a large-scale multimodal dataset for SLR. The dataset integrates an Azure Kinect RGB-D camera, a millimeter-wave (mmWave) radar, and two wrist-worn inertial measurement units (IMUs) to capture complementary visual, radio-reflection, and kinematic information. Collected in a controlled studio environment with 20 participants performing 160 common American sign language (ASL) signs, SIGMA-ASL provides 93,545 temporally synchronized word-level multimodal clips. A unified sensing framework achieves millisecond-level alignment across modalities, enabling reliable sensor fusion and cross-modal learning. We further design standardized preprocessing pipelines and benchmarking protocols under both user-dependent and user-independent settings, offering a comprehensive foundation for evaluating single and multimodal SLR. Extensive experiments validate the dataset's quality and demonstrate its potential as a valuable resource for developing robust, privacy-preserving, and ubiquitous sign language recognition systems.

0

cs.HC 2026-05-08

Reliance on AI can push human knowledge into a low-diversity trap

Human-AI Co-Evolution and Epistemic Collapse: A Dynamical Systems Perspective

A dynamical model shows feedback between users and language models may lead to degenerative convergence rather than ongoing improvement.

abstract click to expand

Large language models (LLMs) are reshaping how knowledge is produced, with increasing reliance on AI systems for generation, summarization, and reasoning. While prior work has studied cognitive offloading in humans and model collapse in recursive training, these effects are typically considered in isolation. We propose a unified perspective: humans and language models form a coupled dynamical system linked by a feedback loop of usage, generation, and retraining. We introduce a minimal model with three variables -- human cognition, data quality, and model capability -- and show that this feedback can give rise to distinct dynamical regimes. Our analysis identifies three regimes: co-evolutionary enhancement, fragile equilibrium, and degenerative convergence. Through a simple simulation, we demonstrate that increasing reliance on AI can induce a transition toward a low-diversity, suboptimal equilibrium. From an information-theoretic perspective, this transition corresponds to an emergent information bottleneck in the human-AI loop, where entropy reduction reflects loss of diversity and support under closed-loop feedback rather than beneficial compression. These results suggest that the trajectory of AI systems is shaped not only by model design, but by the dynamics of human-AI co-evolution.

0

cs.HC 2026-05-08

LLM ADHD personas stay self-consistent but drift without scripted prompts

LLM-Based Educational Simulation: Evaluating Temporal Student Persona Stability Across ADHD Profiles

High-intensity traits hold across conversations while behaviors shift in free dialog, yet explicit task scripts remove the drift entirely.

abstract click to expand

Student simulation with Large language models (LLMs) offers a scalable alternative for educational research and teacher training. Yet, its validity depends on whether models maintain stable personas across extended interactions. We test this prerequisite using a dual-assessment framework measuring self-reported characteristics and observer-rated behavioral expressions. Across two experiments testing four clinically-grounded ADHD persona conditions, five LLMs, and three prompt designs, we quantify between-conversation stability (N=4,968) and within-conversation stability (N=3,952 across 9 turns). Self-reported characteristics remain stable for high intensities, constituting a necessary prerequisite for valid behavioral simulation. Observer-rated behavioral expression reveals selective instability: within-conversation drift occurs in unscripted dialog for high and moderate ADHD personas. Scripted interactions with explicit task prompts eliminate this drift entirely. Stable, persona-aligned simulated learners benefit from a structured interaction design to maintain behavioral coherence, which holds significant implications for teacher training, adaptive tutoring, and any application requiring sustained, path-dependent learner interactions.

0

cs.HC 2026-05-08

LLM tool raises online learning outcomes and satisfaction

LearnMate²: Design and Evaluation of an LLM-powered Personalized and Adaptive Support System for Online Learning

Custom plans and real-time help from LearnMate^2 beat standard platforms plus separate AI support in small evaluations.

abstract click to expand

Personalization is crucial for effective learning, yet online learning, designed for widespread availability and open access, lacks personalized guidance. Recent advancements in large language models (LLMs) offer opportunities to bridge this gap. We explore how LLM-driven tools may be designed to support personalized and adaptive learning and examine how they shape user experience and learning outcomes. We iteratively designed \tool{} to support online learning by providing personalized study plans, real-time contextual assistance, and adaptive learning activities. A preliminary study ($n=24$) assessed the effectiveness and usability of \tool{} and informed refinements in our system, which we then evaluated ($n = 16$) against a combination of a state-of-the-art online learning platform and an LLM for learning support. Results indicate that \tool{} advances AI pedagogy by improving both learning outcomes and user experience compared to existing online learning and support tools. This work advances our understanding of the design space of personalized, AI-driven educational tools and their potential impact on user experience.

0

cs.HC 2026-05-08

Reinforcement learning optimizes emotion metrics that gradients cannot touch

AffectGPT-RL: Revealing Roles of Reinforcement Learning in Open-Vocabulary Emotion Recognition

AffectGPT-RL uses rewards from emotion wheels to improve open-vocabulary recognition and reach state-of-the-art on basic emotion benchmarks.

abstract click to expand

Open-Vocabulary Multimodal Emotion Recognition (OV-MER) aims to predict emotions without being constrained by predefined label spaces, thereby enabling fine-grained emotion understanding. Unlike traditional discriminative methods, OV-MER leverages generative models to capture the full spectrum of emotions and employs emotion wheels (EWs) for metric calculation. Previous approaches primarily rely on token-level loss during training. However, this objective is misaligned with the metrics used in OV-MER, and these metrics cannot be directly optimized via gradient backpropagation. To address this limitation, we turn our attention to reinforcement learning, as this strategy can optimize non-differentiable objectives. We term this framework AffectGPT-RL. Furthermore, we conduct extensive experiments to elucidate the role of reinforcement learning in this task, revealing the necessity of the reasoning process, the impact of different rewards, and the generalizability to other emotion tasks such as sentiment analysis and basic emotion recognition. Experimental results demonstrate that AffectGPT-RL yields significant performance improvements on OV-MER. Beyond this task, we also achieve remarkable performance gains on basic emotion recognition, attaining state-of-the-art results on MER-UniBench. To the best of our knowledge, this is the pioneering work exploring the role of reinforcement learning in OV-MER, providing valuable guidance for subsequent researchers. Our code is provided in the supplementary material and will be released to facilitate future research.

0

cs.HC 2026-05-08

New column type places event sequences inside tables

EventColumn: Integrating Event Sequences into Tabular Visualizations

EventColumn lets analysts compare sequences with other attributes at row and group levels using compressed views, heatmaps, alignment, and历史

abstract click to expand

We introduce EventColumn, a new column type that integrates event-sequence data with heterogeneous tabular attributes into a single unified table. EventColumn lets analysts compare event sequences alongside numerical, categorical, and temporal attributes at both instance and group levels, offering a compressed overview, heatmap group summaries, alignment by event types, and boxplots of similar historical items. We developed EventColumn together with collaborators from the steel industry to facilitate the analysis of production events and warehouse logistics, but the solution generalizes to a wide range of event sequence datasets with additional tabular attributes. Unlike most existing approaches that compare either event sequences or tables, EventColumn supports simultaneous comparison of both. We demonstrate its integration with Taggle and Microsoft Power BI on data from steel production logistics and on a public e-commerce dataset.

0

cs.HC 2026-05-08

ICA artifact removal shows no consistent EEG decoding gain

I see artifacts: ICA-based EEG artifact removal does not improve deep network decoding across three BCI tasks

Across three BCI tasks, data cleaned by ICA performed similarly to raw signals in multiple deep networks.

abstract click to expand

In this paper, we conduct a detailed investigation on the effect of independent component (IC)-based noise rejection methods in neural network classifier-based decoding of electroencephalography (EEG) data in different task datasets. We apply a pipeline matrix of two popular different independent component (IC) decomposition methods (Infomax and Adaptive Mixture Independent Component Analysis (AMICA)) with three different component rejection strategies (none, ICLabel, and multiple artifact rejection algorithm [MARA]) on three different EEG datasets (motor imagery, long-term memory formation, and visual memory). We cross-validate processed data from each pipeline with three architectures commonly used for EEG classification (two convolutional neural networks and one long short-term memory-based model. We compare decoding performances on within-participant and within-dataset levels.Our results show that the benefit from using IC-based noise rejection for decoding analyses is at best minor, as component-rejected data did not show consistently better performance than data without rejections; especially given the significant computational resources required for independent component analysis (ICA) computations.

0

cs.HC 2026-05-08

Gaze and effort feedback lifts pair programming success

Can providing feedback on gaze and mental-effort synchrony improve pair programming performance?

Reactive and proactive cues from joint attention and mental effort raise debugging rates and cut task time in two studies.

abstract click to expand

Pair programming is a widely used collaborative learning practice in computer science education yet its effectiveness varies substantially due to breakdowns in coordination attention and cognitive regulation between partners. This paper investigates whether AI supported feedback grounded in joint visual attention and joint mental effort can improve collaborative programming performance and how feedback timing shapes learner AI interaction. Two experimental studies using dual eye tracking capture real time indicators of collaborative regulation during debugging tasks. Study 1 examines reactive feedback that intervenes when observed joint visual attention or joint mental effort deviates beyond predefined thresholds while Study 2 evaluates proactive feedback that forecasts future regulatory breakdowns using machine learning models and intervenes pre emptively. Across both studies feedback effectiveness is assessed through debugging success time on task and feedback uptake reflected in code changes. Multimodal feedback significantly improves collaborative performance compared to no feedback conditions. Reactive feedback yields strong gains in debugging success and efficiency particularly when joint visual attention and joint mental effort based feedback are combined. Proactive forecast based feedback further enhances performance reduces time on task and increases constructive feedback uptake while relying less on intrusive interventions. Proactive feedback better preserves learner agency by maintaining optimal collaboration states, particularly for high-performing pairs. These findings demonstrate that gaze and mental effort synchrony can serve as reliable actionable triggers for AI supported collaborative learning highlighting the importance of feedback timing transparency and anticipatory regulation in supporting effective pair programming.

0

cs.HC 2026-05-08

Gaze data lets LLM predict cognitive load without retraining

GazeMind: A Gaze-Guided LLM Agent for Personalized Cognitive Load Assessment

Structured eye movements plus task guidance and user history enable accurate assessments across people and activities.

abstract click to expand

Smart glasses with AI assistants are increasingly used in daily life. However, current systems lack awareness of the user's internal cognitive state, leaving them unable to proactively anticipate users' needs without access to cognitive load. Existing methods for assessing cognitive load either rely on impractical sensors for lightweight eyewear or utilize eye gaze-based models that suffer from poor interpretability, and require task-specific fine-tuning, often failing to generalize across individuals. We propose GazeMind, a gaze-guided LLM agent framework for cognitive load assessment on smart glasses. It encodes eye-tracking data into structured representations for LLM-based reasoning and provides interpretable cognitive load predictions. Importantly, GazeMind generalizes across scenarios without LLM fine-tuning through a novel task-guidance reasoning approach and achieves personalized adaptation by incorporating user-specific characteristics and historical references. To support evaluation, we introduce CogLoad-Bench, the largest gaze-based cognitive load dataset with 152 participants, 40+ hours of multimodal data, and 10K+ real-time annotations across controlled and real-world tasks. Experiments show that GazeMind achieves state-of-the-art performance, outperforming baselines by over 20% across all metrics.

0

cs.HC 2026-05-08

Early LLM sessions lock in lasting user patterns

Priming, Path-dependence, and Plasticity: Understanding the molding of user-LLM interaction and its implications from (many) chat logs in the wild

140K real chats show rapid stabilization of styles and less exploration than open interfaces allow, creating an agency paradox

abstract click to expand

User interactions with LLMs are shaped by prior experiences and individual exploration, but in-lab studies do not provide system designers with visibility into these in-the-wild factors. This work explores a new approach to studying real-world user-LLM interactions through large-scale chat logs from the wild. Through analysis of 140K chatbot sessions from 7,955 anonymized global users over time, we demonstrate key patterns in user expressions despite varied tasks: (1) LLM users are not tabula rasa, nor are they constantly adapting; rather, interaction patterns form and stabilize rapidly through individual early trajectories; (2) Longitudinal outcomes, such as recurring text patterns and retention rates, are strongly correlated with early exploration; (3) Parallel dynamics are present, including organizing expressions by task types such as emotional support, or in response to model-version updates. These results present an ``agency paradox'': despite LLM input spaces being unconstrained and user-driven, we in fact see less user exploration. We call for design consideration surrounding the molding procedure and its incorporation in future research.

0

cs.HC 2026-05-08

Persona use in AI red-teaming raises attack success rates

PersonaTeaming: Supporting Persona-Driven Red-Teaming for Generative AI

A workflow and playground interface let red-teamers author personas and collaborate with AI to find more model vulnerabilities with diverse

abstract click to expand

Recent developments in AI safety research have called for red-teaming methods that effectively surface potential risks posed by generative AI models, with growing emphasis on how red-teamers' backgrounds and perspectives shape their strategies and the risks they uncover. While automated red-teaming approaches promise to complement human red-teaming through larger-scale exploration, existing automated approaches do not account for human identities and rarely incorporate human inputs. In this work, we explore persona-driven red-teaming to advance both automated red-teaming and human-AI collaboration. We first develop PersonaTeaming Workflow, which incorporates personas into the adversarial prompt generation process to explore a wider spectrum of adversarial strategies. Compared to RainbowPlus, a state-of-the-art automated red-teaming method, PersonaTeaming Workflow achieves higher attack success rates while maintaining prompt diversity. However, since automated personas only approximate real human perspectives, we further instantiate PersonaTeaming Workflow as PersonaTeaming Playground, a user-facing interface that enables red-teamers to author their own personas and collaborate with AI to mutate and refine prompts. In a user study with 11 industry practitioners, we found that PersonaTeaming Playground enabled diverse red-teaming strategies and outputs that practitioners perceived as useful, and that AI-generated suggestions in the PersonaTeaming Playground encouraged out-of-the-box thinking even when practitioners did not follow them strictly. Together, our work advances both automated and human-in-the-loop approaches to red-teaming, while shedding light on interaction patterns and design insights for supporting human-AI collaboration in generative AI red-teaming.

0

cs.HC 2026-05-08 Recognition

Persona-driven prompts raise red-teaming success on generative AI

PersonaTeaming: Supporting Persona-Driven Red-Teaming for Generative AI

The workflow beats prior automated baselines on attack rate while giving practitioners control to define personas and refine outputs with AI

abstract click to expand

Recent developments in AI safety research have called for red-teaming methods that effectively surface potential risks posed by generative AI models, with growing emphasis on how red-teamers' backgrounds and perspectives shape their strategies and the risks they uncover. While automated red-teaming approaches promise to complement human red-teaming through larger-scale exploration, existing automated approaches do not account for human identities and rarely incorporate human inputs. In this work, we explore persona-driven red-teaming to advance both automated red-teaming and human-AI collaboration. We first develop PersonaTeaming Workflow, which incorporates personas into the adversarial prompt generation process to explore a wider spectrum of adversarial strategies. Compared to RainbowPlus, a state-of-the-art automated red-teaming method, PersonaTeaming Workflow achieves higher attack success rates while maintaining prompt diversity. However, since automated personas only approximate real human perspectives, we further instantiate PersonaTeaming Workflow as PersonaTeaming Playground, a user-facing interface that enables red-teamers to author their own personas and collaborate with AI to mutate and refine prompts. In a user study with 11 industry practitioners, we found that PersonaTeaming Playground enabled diverse red-teaming strategies and outputs that practitioners perceived as useful, and that AI-generated suggestions in the PersonaTeaming Playground encouraged out-of-the-box thinking even when practitioners did not follow them strictly. Together, our work advances both automated and human-in-the-loop approaches to red-teaming, while shedding light on interaction patterns and design insights for supporting human-AI collaboration in generative AI red-teaming.

0

cs.HC 2026-05-08

Social media stalls caring at awareness

The Capacity to Care: Designing Social Technology for Sustained Engagement With Societal Challenges

Tronto's ethics framework shows platforms can support responsibility, competence and community to reduce overwhelm from global issues.

abstract click to expand

People care about climate change, injustice, and humanitarian crises. The challenge is not apathy but capacity: sustained engagement with large-scale problems is psychologically costly, and social media architecture often amplifies awareness while providing few pathways to meaningful action. The result is rising distress, overwhelm, and disengagement -- particularly among young people who encounter global suffering through platforms designed for attention capture rather than constructive response. This workshop examines how social technology design shapes the conditions for sustained engagement with societal challenges. Drawing on Tronto's care ethics framework and research in moral psychology and platform studies, we ask why caring at scale is difficult and how social media can both exacerbate and potentially mitigate this difficulty. Tronto's framework shows that good care requires more than awareness: it demands responsibility, competence, and community. Dominant social media architectures stall the caring process at its earliest phase. We invite researchers and designers to identify platform designs that deplete or support the capacity to care, and to develop design directions for \textit{sustainable care}: engagement that people can maintain over time without burning out.

0

cs.HC 2026-05-08

AI usability needs probability distributions

UX in the Age of AI: Rethinking Evaluation Metrics Through a Statistical Lens

Legacy metrics assume deterministic outputs, but AI systems are stochastic, so three new indices capture entropy, drift over sessions, and

abstract click to expand

The rapid proliferation of artificial intelligence (AI) in consumer-facing digital products has disrupted the assumptions underlying classical user experience (UX) evaluation frameworks. Legacy metrics such as the System Usability Scale (SUS), Net Promoter Score (NPS), and task completion rate were engineered for deterministic, rule-based interfaces where identical inputs yield identical outputs. In AI-mediated systems -- spanning conversational agents, generative interfaces, and recommendation engines -- outputs are stochastic, context-sensitive, and temporally variable, rendering these metrics structurally insufficient. This paper introduces the Adaptive Dynamic UX Statistical Framework (ADUX-Stat), a novel evaluation model that reconceptualises usability as a probabilistic signal distribution rather than a static scalar score. ADUX-Stat integrates three original constructs: (1) Interaction Entropy Index (IEI), quantifying the unpredictability of AI responses from a user perception standpoint; (2) Temporal Drift Coefficient (TDC), measuring longitudinal degradation or improvement of perceived usability over interaction sessions; and (3) Bayesian Usability Confidence Score (BUCS), producing credible interval estimates of usability quality under uncertainty. The framework is validated conceptually against five established AI product categories. ADUX-Stat addresses a critical gap at the intersection of HCI research, statistical modelling, and AI product evaluation, offering a reproducible, field-deployable methodology for UX practitioners and researchers alike.

0

cs.HC 2026-05-08

AI safety tools interrupt older adults seeking emotional support

Designing with Tensions: Older Adults' Emotional Support-Seeking Under System-Level Constraints in Conversational AI

Interviews show safety redirects often break emotional flow and can leave users feeling more distressed instead of protected.

abstract click to expand

Older adults have increasingly turned to conversational AI as a source of emotional support. However, little is known about how emotionally supportive interactions are experienced in everyday use, particularly when AI systems limit, redirect, or intervene during these interactions. We interviewed 18 older adults about their experiences using conversational AI for emotional support, examining when they turn to AI, how they engage during emotionally vulnerable moments, and how they respond when support feels disrupted. Our findings show that older adults often rely on AI when other forms of social support feel inaccessible. However, current safety-related interventions can redirect interactions in ways that participants experience as interruptions to emotional engagement or as shifts in control away from them. Such disruptions can undermine older adults' ability to remain emotionally engaged and, in some cases, contribute to emotional distress. We discussed design implications for emotionally supportive conversational AI, emphasizing the need for safety interventions that are enacted within older adults' social contexts, align with users' emotional pacing, and preserve their sense of agency.

0

cs.HC 2026-05-07

Pitch-speed ML models drop from 0.91 to 0.38 R-squared on new pitchers

Cross-individual generalizability of machine learning models for ball speed prediction in baseball pitching

Trunk and pivot leg retain usable signals across athletes while overall models overestimate intermediate pitchers.

abstract click to expand

Although machine learning (ML)-based performance outcome prediction is an important topic in contemporary sports science, one important issue is the limited understanding of the cross-individual generalizability of ML models in sports contexts. To address this issue, this study aimed to evaluate the cross-individual generalizability of ML models for predicting ball speed in baseball pitching. A dataset comprising 50 pitchers from various competitive levels was analyzed. Cross-individual generalizability was assessed using leave-one-subject-out cross-validation. Specifically, the effects of expertise level and restrictions on spatiotemporal motion information were examined to identify factors influencing model generalizability. The results revealed that, under cross-individual evaluation, (1) predictive performance was markedly lower than under within-individual evaluation, with R-squared value decreasing from 0.91 to 0.38; (2) the model tended to overestimate the performance of Intermediate pitchers relative to Expert pitchers, with a significant group difference in signed prediction error (p < .05); and (3) the trunk and pivot leg demonstrated relatively high generalization performance, with the pivot leg showing notable generalizability even during the weight-shift initiation phase (R-squared value > 0.25). These findings underscore the importance of cross-individual evaluation in enhancing the practical applicability of ML in sports settings and contribute to a deeper understanding of the biomechanical factors underlying the target movement.

0

cs.HC 2026-05-07

Eye contact relies on naming and visibility work for visually impaired collaborators

The Ambivalent Experience of Eye Contact for People with Visual Impairments: Mechanisms and Design Challenges

Three mechanisms show why gaze visibility alone falls short and point to configurable interaction contracts that reduce fatigue and bias.

abstract click to expand

In mixed-ability collaboration, eye contact is often treated as a default cue for attention and turn-taking. As these signals are primarily visual, they are not reliably accessible to people with visual impairments. While prior work emphasized technical solutions, mechanism-level explanations of their experiences with sighted partners remain scarce. We interviewed 17 people with visual impairments about everyday interactions across work, education, and social settings. Using a critical-realist lens, we link events to plausible causal mechanisms and identify three recurring mechanisms: First, when gaze cannot allocate the floor, addressability hinges on explicit naming. Second, unclear speech entry cues and ongoing access work split attention and build fatigue, sometimes leading to withdrawal. Third, eye-contact norms can skew judgments of participation, prompting active management of visibility. We translate these mechanisms into five design challenges that reframe accessible eye contact as supporting configurable interaction contracts rather than merely making gaze visible.

0

cs.HC 2026-05-07

Hindsight surprise predicts which contrast people infer from why-questions

Why Someone Asked "Why": Foil Inference in Human and LLM Question Interpretation

Vignette experiments show post-outcome expectations outperform prior beliefs and similarity measures, while LLMs display inconsistent links.

abstract click to expand

Explanations are inherently contrastive: E happened rather than E' because of C rather than C'. However, these contrasts, or "foils", are rarely mentioned explicitly but have to be inferred in context. Here, we investigate how people select the intended foil E' of a why-question. Participants read vignettes and judged, for each foil, their prior expectation (what will happen next), closeness (what is most similar to what happened), and hindsight expectation (what could have happened instead), as well as which foil they thought the question asker had in mind when they asked the why-question. We found that foil selections were best predicted by hindsight expectation judgments. This suggests that people infer the foil by considering what a question asker finds surprising after the outcome occurred. Since correct foil selection is relevant not only in human-human interaction but also increasingly in dialogues with large language models, we investigated their performance on the same task. The coupling between LLMs' explicit expectation judgments and their foil selections is inconsistent.

0

cs.HC 2026-05-07

Recovery codes lift LLM error handling by 28 percent

Every(bot) Makes Mistakes: Coding Big Five Personalities, Context, and Tone into an LLM Chatbot Recovery Code Framework

Mapping contexts to Big Five traits and tones lets chatbots produce more appropriate responses to mistakes.

abstract click to expand

Despite careful design involving classifiers, parameters, and safeguarding, errors during human/AI interaction are not rare. Poor error recovery can disrupt interaction flow, damage user trust, and decrease user engagement. Whilst existing work has explored LLM recovery, tone, context, and personality as separate design dimensions, no existing work has combined these variables into a structured guidance framework. This paper presents a recovery code that maps four common LLM chatbot task contexts to associated personality traits (four Big Five personalities: Conscientiousness, Agreeableness, Openness, and Extraversion), tones, and three-stage recovery instructions. A recovery evaluation rubric was also designed, comprising three dimensions (Recovery quality, Tone alignment, and Appropriateness) and nine sub-dimensions. The methodology is exploratory, with no participants used. A between-subjects design was employed across two conditions: Condition A (baseline, uncoded), four separate Claude Sonnet 4.6 agents received no recovery code training; Condition B (coded), four separate Claude Sonnet 4.6 models were trained on the recovery code. Identical 'user' prompts and error scenarios were used across both conditions. Eight LLM evaluator agents assessed the recovery responses using the evaluation rubric, producing scores out of 5 for each sub-dimension. Results found a 27.8% average performance increase in coded recovery responses (76.7%) compared to baseline responses (48.9%). Condition B performed strongest in the appropriateness dimension (83.3%), with notable improvement in personality appropriateness (75% versus 50%) and providing explanation (60% versus 20%). These findings suggest that structured personality, context, and tone-informed recovery codes can be successfully learnt and applied by LLM chatbots to improve error recovery quality across varying contextual tasks.

0

cs.HC 2026-05-07

AI drafts halve audio description time only above quality threshold

Making AI Drafts Count: A Quality Threshold in Audio Description Workflows

Simple unguided prompts add little value; guided drafts that follow accessibility rules and video context deliver large cuts in time andload

abstract click to expand

Audio description (AD) narrates visual elements in video for blind and low-vision audiences. Recent work has shown that giving novice describers an AI-generated draft to start from helps produce higher-quality AD and lowers the barrier to entry. What remains an open question is how draft quality shapes the editing process. We investigate this through GenAD, an AD generation pipeline that incorporates accessibility guidelines and contextual video information, and RefineAD, an editing interface for human revisions. Human-AI contributions are measured across text, timing, and delivery. In a within-subjects study, we compared authoring from scratch against editing AI drafts of varying quality. GenAD drafts cut completion time by more than half and significantly reduced cognitive load. In contrast, baseline drafts generated from simple, unguided prompts offered only modest benefits, pointing to a minimum quality threshold for effectiveness. Qualitative findings suggest this threshold is content-dependent; as visual complexity increases, so does the quality needed from AI drafts. We propose this as a design principle: effective AI assistance should clear a quality threshold suited to the target content, rather than simply be present.

0

cs.HC 2026-05-07

Hybrid LLM tailors scaffolding to diagnostic strategy

Tailoring Scaffolding to Diagnostic Strategies: Theory-Informed LLM-Based Agents

KLI framework aligns support to the knowledge demands of each strategy rather than using one fixed approach

abstract click to expand

Learning analytics systems increasingly integrate large language models (LLMs) to provide adaptive scaffolding in complex learning environments, yet personalization is often driven by global instructional choices rather than principled alignment with learning theory, limiting effectiveness and pedagogical grounding. In prior work, we examined how structuring and problematizing scaffolding approaches can be instantiated through LLM agents in a scenario-based learning environment for diagnostic reasoning. While both approaches supported learning, we observed systematic differences in learner interaction patterns and clear tendencies indicating that different diagnostic strategies benefited from distinct forms of scaffolding. Building on these findings, we propose a theory-informed scaffolding design grounded in the Knowledge Learning Instruction (KLI) framework, as different diagnostic strategies target different types of knowledge and require different instructional mechanisms. We use KLI to guide the alignment between strategy demands and scaffolding approaches and introduce a KLI-informed hybrid LLM agent that adapts its pedagogical support according to the diagnostic strategy being practiced, rather than applying a single global scaffolding approach. We hypothesize that this design could enable better learning gains.

0