{"total":13,"items":[{"citing_arxiv_id":"2605.20024","ref_index":99,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Journeys of Parents with LGBTQ+ Children: How Trauma and Healing Reshape Identity and (Mis)Informating Practices","primary_cat":"cs.HC","submitted_at":"2026-05-19T15:48:51+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A qualitative study of South Korean parents shows that trauma and healing after learning a child is LGBTQ+ leads to identity reconstruction as supportive parents and more critical, protective informating practices.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.15848","ref_index":27,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Conversations in Space: Structuring Non-Linear LLM Interactions on a Canvas","primary_cat":"cs.HC","submitted_at":"2026-05-15T11:01:31+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"CanvasConvo presents a spatial canvas interface for branching LLM conversations, evaluated in a 5-7 day field study with 24 participants that found support for exploratory workflows.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.08650","ref_index":133,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Fast-Food Intimacy: How Chinese Women Navigate Soul's AI Boyfriend","primary_cat":"cs.HC","submitted_at":"2026-05-09T03:40:51+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Users experience fast-food intimacy with Soul's AI boyfriend that conflicts with gradual cultural expectations, introduces technical uncertainty, and shifts emotional labor onto women.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Across China, young people, especially urban women, are increasingly forming relationships with AI companions on platforms such as Soul, Glow, and Wantalk [7, 93]. The scale is notable: Soul reported 29.4 million monthly active users in 2022, roughly 80% of whom were Gen Z; by 2024, more than 3.59 million users engaged in romance-related discussions [133, 145]. The viral- ity of the Hangzhou episode, combined with Soul's user scale, re- flects broader cultural engagement with AI-mediated romance [67]. Together, these patterns raise a broader question for HCI: beyond interactional usability in conversational interfaces, how does AI Total 669 Comments Edit on Jan 29 #dating software #social software #social APP #AI"},{"citing_arxiv_id":"2604.19114","ref_index":50,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"OOPrompt: Reifying Intents into Structured Artifacts for Modular and Iterative Prompting","primary_cat":"cs.HC","submitted_at":"2026-04-21T05:48:10+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"OOPrompt reifies user intents into structured manipulable artifacts to enable modular and iterative prompting in LLM-based interactive systems.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"tents, as well as categorical alignment scores determined by evalua- tive LLMs, for assessing consistency [13].ChainForgealso leverages evaluative LLMs, but to determine evaluation scores based on pro- vided ground truth data [2].Self-Supervised Prompt Optimization uses a reference-free evaluation paradigm, involving an LLM eval- uator performing pairwise comparison of outputs generated by different prompts [50].iPrOpinvolves a human-in-the-loop pro- cess and considers four key prompt properties: Performance on annotated data, readability and interpretability, quality of an ex- planation, and alignment of annotations [18].Prompt Optimization with Human Feedbackrelies entirely on human preference feed- back, training a model to predict a latent score for each prompt [19]."},{"citing_arxiv_id":"2604.18724","ref_index":53,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Beyond One Output: Visualizing and Comparing Distributions of Language Model Generations","primary_cat":"cs.AI","submitted_at":"2026-04-20T18:22:31+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"GROVE visualizes distributions of language model generations as overlapping paths through a text graph, with user studies showing that graph summaries aid structural judgments like diversity assessment while raw outputs remain better for details.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.11701","ref_index":81,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"HeartSway: Exploring Biodata as Poetic Traces in Public Space","primary_cat":"cs.HC","submitted_at":"2026-04-13T16:37:52+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"An interactive public hammock captures and replays biodata as embodied traces, with a field study of ten users indicating it fosters anonymous connection and appreciation for shared vitality.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"For example, Huang et al. high- lighted the audience's frisson moments when watching the same online video [33], and Hassib et al. annotated speech bubbles with emotion indicators [23]. Meanwhile, some other works tried to sup- port positive human nature and artistic expression, incorporating biodata series into self-expressive clothes [39], accessories [81], and provocative art pieces [73]. To complement these studies, we aim to inquire into the innate qualities of human biodata and how they can be communicated. Hence, we focus on another scenario, where biodata is primarily communicated, perceived, and understood (cf. self-expression) and as the only information being conveyed (cf. augmenting other information channels)."},{"citing_arxiv_id":"2604.10925","ref_index":53,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"From Words to Widgets for Controllable LLM Generation","primary_cat":"cs.HC","submitted_at":"2026-04-13T02:46:31+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Malleable Prompting reifies subjective preferences from natural language into GUI widgets and modulates LLM token probabilities during decoding to enable controllable generation, with a user study showing improved precision and perceived controllability over standard prompting.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Association for Computing Machinery, New York, NY, USA, 1-15. doi:10.1145/3719160.3736618 [52] Rongzhi Zhang, Liqin Ye, Yuzhao Heng, Xiang Chen, Tong Yu, Lingkai Kong, Sud- heer Chava, and Chao Zhang. 2025. Precise Attribute Intensity Control in Large Language Models via Targeted Representation Editing. arXiv:2510.12121 [cs] doi:10.48550/arXiv.2510.12121 [53] Zheng Zhang, Jie Gao, Ranjodh Singh Dhaliwal, and Toby Jia-Jun Li. 2023. VISAR: A Human-AI Argumentative Writing Assistant with Visual Programming and Rapid Draft Prototyping. InProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (UIST '23). Association for Computing Machinery, New York, NY, USA, 1-30. doi:10.1145/3586183."},{"citing_arxiv_id":"2604.10587","ref_index":73,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"CogInstrument: Modeling Cognitive Processes for Bidirectional Human-LLM Alignment in Planning Tasks","primary_cat":"cs.HC","submitted_at":"2026-04-12T11:15:18+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"CogInstrument represents human reasoning as revisable cognitive motifs in graphical form to support iterative alignment with LLMs during planning tasks, with a N=12 study indicating gains in targeted revision, agency, and trust over standard dialogue interfaces.","context_count":1,"top_context_role":"method","top_context_polarity":"use_method","context_text":"extracts concepts 𝑐 from given a user utterance 𝑢𝑡 by identification, disambiguation, and validation. These candidates are not treated as committed state; they are later grounded through cross-turn evidence, function-call support, and structural consistency checks. Causal Link Identification.Within these fixed schemas, LLM- assisted causal discovery proposes candidate dependency edges [73]. The runtime then grounds, questions, and stabilizes these candidates so that dependencies serve not only as inferential struc- ture, but also as the editable backbone of the externalized reason- ing graph. Although we illustrate the process with travel-style examples, the representation and interaction mechanisms are task- agnostic. Validation Through Proactive Questioning."},{"citing_arxiv_id":"2604.07643","ref_index":97,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Narrix: Remixing Narrative Strategies from Examples for Story Writing","primary_cat":"cs.HC","submitted_at":"2026-04-08T23:05:11+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Narrix helps novices identify and reuse narrative strategies from examples through visualization and strategy-steered generation, improving retention, confidence, and adaptation over chat interfaces in a 12-person study.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Nowadays, large language models (LLMs) enable writers to steer generation text through prompting [23, 24, 40, 81, 94, 100]. While prompt-based interactions offer flexibility, they also introduce several challenges for our target user and scenario. First, prior research has shown that users without AI expertise may strug- gle to design effective prompts [ 97]. This challenge is amplified when our target users are also novices in writing: they may lack narratological knowledge (e.g., narrative strategies), which hinders their ability to craft prompts that analyze example stories, extract strategies, and transfer them into their own writing. Second, conver- sational interfaces typically constrain users to a linear flow."},{"citing_arxiv_id":"2601.23206","ref_index":34,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"High-quality generation of dynamic game content via small language models: A proof of concept","primary_cat":"cs.AI","submitted_at":"2026-01-30T17:30:59+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Proof-of-concept shows fine-tuned small language models achieve adequate quality for real-time game content generation in a scoped RPG loop via retry-until-success and LLM-as-judge evaluation.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2509.11206","ref_index":92,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Evalet: Evaluating Large Language Models through Functional Fragmentation","primary_cat":"cs.HC","submitted_at":"2025-09-14T10:24:13+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Evalet applies functional fragmentation to deliver fragment-level qualitative analysis of LLM evaluations, with a user study showing 48% more misalignment detections than holistic scoring.","context_count":1,"top_context_role":"background","top_context_polarity":"unclear","context_text":"P Xing, Hao Zhang, Joseph E. Gonzalez, and Ion Stoica. 2023. Judging LLM-as-a-judge with MT- Bench and Chatbot Arena. arXiv:2306.05685 [cs.CL] [91] Ming Zhong, Yang Liu, Da Yin, Yuning Mao, Yizhu Jiao, Pengfei Liu, Chenguang Zhu, Heng Ji, and Jiawei Han. 2022. Towards a unified multi-dimensional evalua- tor for text generation.arXiv preprint arXiv:2210.07197(2022). [92] Xuhui Zhou, Hao Zhu, Leena Mathur, Ruohong Zhang, Haofei Yu, Zhengyang Qi, Louis-Philippe Morency, Yonatan Bisk, Daniel Fried, Graham Neubig, et al. 2023. Sotopia: Interactive evaluation for social intelligence in language agents. arXiv preprint arXiv:2310.11667(2023). Conference acronym 'XX, June 03-05, 2018, Woodstock, NY Tae Soo Kim*, Heechan Lee*, Yoonjoo Lee, Joseph Seering, and Juho Kim"},{"citing_arxiv_id":"2406.04244","ref_index":165,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Benchmark Data Contamination of Large Language Models: A Survey","primary_cat":"cs.CL","submitted_at":"2024-06-06T16:41:39+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":3.0,"formal_verification":"none","one_line_summary":"A survey reviewing benchmark data contamination in LLMs, its impact on evaluation, and alternative assessment approaches.","context_count":1,"top_context_role":"dataset","top_context_polarity":"use_dataset","context_text":"using chronological evidence) to provide evidence of this contamination. The paper also finds that for classification tasks without task contamination, LLMs show no significant improvement over simple majority baselines. Ranaldi et al. [118] introduced another method for detecting BDC in GPT models. They assessed GPT-3.5's performance using the well-known Spider Dataset [ 165] and a novel dataset called Termite. Additionally, they employed an adversarial table disconnection (ATD) approach, which complicates Text-to-SQL tasks by removing structural pieces of information from the database. This method allowed them to analyze GPT-3.5's efficacy on databases with modified information and assess the impact of BDC on the model's performance."},{"citing_arxiv_id":"2201.11903","ref_index":77,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Chain-of-Thought Prompting Elicits Reasoning in Large Language Models","primary_cat":"cs.CL","submitted_at":"2022-01-28T02:33:07+00:00","verdict":"ACCEPT","verdict_confidence":"HIGH","novelty_score":9.0,"formal_verification":"none","one_line_summary":"Chain-of-thought prompting, by including intermediate reasoning steps in few-shot examples, elicits strong reasoning abilities in large language models on arithmetic, commonsense, and symbolic tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}