arxiv: 2605.01017 · v1 · submitted 2026-05-01 · 💻 cs.CL

Recognition: unknown

Psychologically Potent, Computationally Invisible: LLMs Generate Social-Comparison Triggers They Fail to Detect

Hua Zhao , Jiapei Gu , Michelle Mingyue Gu

Authors on Pith no claims yet

Pith reviewed 2026-05-09 19:07 UTC · model grok-4.3

classification 💻 cs.CL

keywords social comparisonLLM detectionbenchmarkXiaohongshuprompt-based classificationgeneration vs detectionreader elicitationrelational cues

0 comments

The pith

LLMs generate social-comparison triggers but fail to detect them with prompts

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces XHS-SCoRE, a benchmark built from first-person reader labels on Xiaohongshu posts to classify whether text elicits upward, downward, or neutral social comparison. It demonstrates a consistent gap where LLMs fluently produce posts that shift readers' perceived standing and comparison-related feelings, yet the same models show unstable detection when using prompts. Supervised Chinese encoder models can learn the signal when trained in-domain, but prompt-based classifiers exhibit repeatable failures including neutralization of triggering content and model-specific directional biases. This matters because it shows AI can create psychologically potent relational cues in text without reliable access to those cues through standard inference methods.

Core claim

The central claim is that LLMs display a generation-detection mismatch for social comparison: they can create Xiaohongshu-style posts that measurably alter perceived standing and affect, but prompted classifiers fail to identify the triggers reliably, with stable error patterns such as over-neutralization and skew. The XHS-SCoRE benchmark establishes that the underlying signal is textually learnable yet not robustly accessible to prompt-based classification.

What carries the argument

The XHS-SCoRE benchmark, which collects first-person reader judgments labeling posts as upward, downward, or neutral social comparison elicitors, functions as the diagnostic tool to expose the mismatch between fluent generation and fragile prompt-based detection.

If this is right

AI content generators may produce posts that influence self-perception without built-in ability to flag the mechanism.
Prompt engineering alone proves insufficient for reliable detection of subtle relational signals in social media text.
Supervised training on reader-labeled data succeeds where prompting fails, pointing to hybrid detection needs.
Generated posts can change comparison-related affect even when the model cannot recognize the eliciting features.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Content moderation systems relying on prompt-based LLM self-assessment may miss social comparison triggers in generated material.
Training on reader-grounded labels could help models handle other psychologically subtle cues beyond sentiment.
The mismatch raises questions about whether generation and detection of social signals require fundamentally different model access methods.

Load-bearing premise

That first-person reader labels on Xiaohongshu posts accurately capture the stable psychological experience of social comparison rather than artifacts from the platform or labeling process.

What would settle it

A controlled test in which prompted LLMs classify XHS-SCoRE posts with accuracy matching or exceeding supervised in-domain baselines, or in which LLM-generated posts produce no measurable shift in readers' perceived standing or affect.

Figures

Figures reproduced from arXiv: 2605.01017 by Hua Zhao, Jiapei Gu, Michelle Mingyue Gu.

**Figure 1.** Figure 1: Confusion matrices (test): primary zero-shot LLM vs best encoder baseline. % row-normalized. view at source ↗

read the original abstract

We introduce Xiaohongshu Social Comparison Reader Elicitation (XHS-SCoRE), a reader-grounded benchmark for detecting if a text-only Xiaohongshu (RedNote) post elicits UPWARD, DOWNWARD, or NEUTRAL/no clear social comparison from a first-person reader perspective. The task targets a socially meaningful relational signal that is behaviorally real yet not reducible to sentiment. Across prompted LLM classifiers and supervised Chinese encoder baselines, we find a consistent mismatch between generation fluency and reliable detection ability: the signal is textually learnable in-domain, but not robustly accessible to prompt-based classification. Prompted LLM classifiers exhibit stable, interpretable failure modes, especially neutralization of comparison-triggering posts and model-specific directional skew. A controlled pilot further shows that LLM-generated Xiaohongshu-style posts can shift perceived standing and comparison-related affect even when prompt-based detection of the same construct remains fragile. XHS-SCoRE contributes both a benchmark for reader-grounded comparison detection and a diagnostic framework for studying when socially meaningful relational cues remain only partially visible to prompt-based inference.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LLMs generate Xiaohongshu posts that shift readers' social comparison feelings but fail to detect the same triggers reliably under prompting, while fine-tuned encoders succeed in-domain.

read the letter

The core observation here is that prompted LLMs produce posts capable of moving readers' perceived standing and comparison-related affect, yet the same models cannot reliably classify those triggers when asked. Supervised Chinese encoders learn the signal from the XHS-SCoRE labels, but prompt-based approaches show stable errors such as neutralization and model-specific directional skews. The controlled pilot supplies some direct evidence that the generated content has measurable downstream effects on readers. This is the part that stands out as new: a reader-grounded benchmark focused on upward, downward, or neutral comparison elicitation rather than sentiment or toxicity alone, plus the explicit generation-versus-detection contrast. The work does a reasonable job framing the task as a relational signal that matters for user well-being and is not reducible to surface polarity. The pilot adds a useful check that the construct is behaviorally active. The main soft spot is the benchmark's grounding. The labels rest on first-person annotations from one platform, and the abstract gives no numbers on inter-rater agreement, test-retest stability, or correlation with established instruments. Without those checks it remains possible that supervised models are fitting platform-specific lexical patterns or annotation artifacts instead of a stable psychological signal. If that holds, the prompt failures would not demonstrate a general limitation in accessing relational cues. This paper is for people working on LLM social reasoning, AI safety around affective impact, or social computing benchmarks. It is worth sending to peer review because the empirical mismatch, if the validation steps check out, is concrete enough to test further and the pilot provides an initial anchor for the generation side.

Referee Report

2 major / 2 minor

Summary. The paper introduces the XHS-SCoRE benchmark consisting of Xiaohongshu (RedNote) posts labeled by first-person readers for eliciting upward, downward, or neutral social comparison. It reports that supervised Chinese encoder models learn the in-domain signal while prompted LLMs exhibit consistent failure modes (e.g., neutralization and directional skew) and cannot reliably detect it, despite a pilot showing that LLM-generated posts can still shift perceived standing and comparison-related affect.

Significance. If the reader labels validly index the intended psychological construct rather than platform artifacts, the work demonstrates a clear dissociation between LLMs' generative fluency and their prompt-based access to subtle relational signals. It supplies a new reader-grounded benchmark and diagnostic framework for studying partial visibility of psychologically meaningful cues, with credit due for the empirical mismatch finding and the controlled pilot design.

major comments (2)

[Benchmark construction] Benchmark construction section: no inter-rater reliability, label distribution, test-retest stability, or correlation with established instruments (e.g., INCOM) is reported for the first-person XHS-SCoRE annotations. This is load-bearing because the central generation-detection mismatch claim and the supervised-model success both presuppose that the labels capture stable psychological social-comparison elicitation rather than annotation artifacts or Xiaohongshu stylistic regularities.
[Pilot study] Pilot study section: the abstract and summary provide no details on sample size, statistical tests, confound controls, or how affect shifts were measured. Without these, the claim that LLM-generated posts shift comparison-related affect cannot be evaluated and remains preliminary, weakening the contrast with detection fragility.

minor comments (2)

[Abstract] Abstract: the phrase 'stable, interpretable failure modes' is used without a concrete example (e.g., neutralization rate or skew direction); adding one would improve immediate clarity.
[Terminology] Terminology: ensure 'UPWARD', 'DOWNWARD', and 'NEUTRAL' are defined once and used consistently in all tables and figures.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback. We address each major comment below, clarifying our methodological choices and indicating where revisions will be made to improve transparency.

read point-by-point responses

Referee: Benchmark construction section: no inter-rater reliability, label distribution, test-retest stability, or correlation with established instruments (e.g., INCOM) is reported for the first-person XHS-SCoRE annotations. This is load-bearing because the central generation-detection mismatch claim and the supervised-model success both presuppose that the labels capture stable psychological social-comparison elicitation rather than annotation artifacts or Xiaohongshu stylistic regularities.

Authors: We agree that greater transparency on the annotation process is warranted. In the revised manuscript we will report the full label distribution across UPWARD, DOWNWARD, and NEUTRAL categories. Inter-rater reliability statistics are not reported because the design intentionally collects first-person reader annotations; each label reflects an individual reader's subjective experience of comparison elicitation rather than an objective property of the post. Traditional IRR metrics are therefore not the appropriate validation criterion, and we will add an explicit discussion of this reader-grounded rationale in the limitations section. Test-retest stability and correlations with instruments such as INCOM were not collected in the present study; we will state this limitation clearly and identify it as a valuable direction for future validation. The fact that supervised Chinese encoders achieve strong in-domain performance nevertheless indicates that the labels encode a learnable signal that goes beyond platform-specific stylistic regularities. revision: partial
Referee: Pilot study section: the abstract and summary provide no details on sample size, statistical tests, confound controls, or how affect shifts were measured. Without these, the claim that LLM-generated posts shift comparison-related affect cannot be evaluated and remains preliminary, weakening the contrast with detection fragility.

Authors: We appreciate the referee highlighting the need for fuller reporting. Although the full manuscript contains the pilot details, the abstract and summary sections are indeed too terse. In the revision we will expand both the abstract and the dedicated pilot-study subsection to specify the sample size (N=50), the pre-post measurement of perceived standing and comparison-related affect, the use of paired statistical tests, and the confound controls (post length, topic category, and presentation order). These additions will allow readers to evaluate the pilot results directly and will strengthen the reported dissociation between generative capability and prompt-based detection. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical benchmark and evaluation are externally grounded

full rationale

The paper constructs XHS-SCoRE from independent first-person reader annotations on Xiaohongshu posts and then reports direct empirical comparisons between prompted LLM classifiers and supervised Chinese encoder baselines. No equations or parameters are fitted and then relabeled as predictions; no self-citations supply load-bearing uniqueness theorems or ansatzes; the central mismatch claim is an observed performance gap on the externally labeled data rather than a definitional or self-referential reduction. The derivation chain consists of standard benchmark creation followed by standard model evaluation and remains self-contained against external reader judgments.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work rests on the domain assumption that social comparison can be reliably elicited and labeled from first-person reader perspective on short social media text, plus the assumption that prompt-based LLM classification is a fair test of what the models can access.

axioms (1)

domain assumption Social comparison is a distinct relational signal separable from sentiment and reliably reportable by readers in first-person terms.
Invoked in the definition of the XHS-SCoRE task and the claim that the signal is behaviorally real.

pith-pipeline@v0.9.0 · 5498 in / 1256 out tokens · 39166 ms · 2026-05-09T19:07:55.214667+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

300 extracted references · 208 canonical work pages · 1 internal anchor

[1]

Proceedings of the 21st Workshop of Young Researchers' Roundtable on Spoken Dialogue Systems. 2025

2025
[2]

Research on LLM s-Empowered Conversational AI for Sustainable Behaviour Change

Chen, Ben. Research on LLM s-Empowered Conversational AI for Sustainable Behaviour Change. Proceedings of the 21st Workshop of Young Researchers' Roundtable on Spoken Dialogue Systems. 2025

2025
[3]

Deep Reinforcement Learning of LLM s using RLHF

Levandovsky, Enoch. Deep Reinforcement Learning of LLM s using RLHF. Proceedings of the 21st Workshop of Young Researchers' Roundtable on Spoken Dialogue Systems. 2025

2025
[4]

Conversational Collaborative Robots

Kranti, Chalamalasetti. Conversational Collaborative Robots. Proceedings of the 21st Workshop of Young Researchers' Roundtable on Spoken Dialogue Systems. 2025

2025
[5]

Dialogue System using Large Language Model-based Dynamic Slot Generation

Hashimoto, Ekai. Dialogue System using Large Language Model-based Dynamic Slot Generation. Proceedings of the 21st Workshop of Young Researchers' Roundtable on Spoken Dialogue Systems. 2025

2025
[6]

Towards Adaptive Human-Agent Collaboration in Real-Time Environments

Nakae, Kaito. Towards Adaptive Human-Agent Collaboration in Real-Time Environments. Proceedings of the 21st Workshop of Young Researchers' Roundtable on Spoken Dialogue Systems. 2025

2025
[7]

Towards Human-Like Dialogue Systems: Integrating Multimodal Emotion Recognition and Non-Verbal Cue Generation

Jiang, Jingjing. Towards Human-Like Dialogue Systems: Integrating Multimodal Emotion Recognition and Non-Verbal Cue Generation. Proceedings of the 21st Workshop of Young Researchers' Roundtable on Spoken Dialogue Systems. 2025

2025
[8]

Controlling Dialogue Systems with Graph-Based Structures

Hilgendorf, Laetitia Mina. Controlling Dialogue Systems with Graph-Based Structures. Proceedings of the 21st Workshop of Young Researchers' Roundtable on Spoken Dialogue Systems. 2025

2025
[9]

Multimodal Agentic Dialogue Systems for Situated Human-Robot Interaction

Sucal, Virgile. Multimodal Agentic Dialogue Systems for Situated Human-Robot Interaction. Proceedings of the 21st Workshop of Young Researchers' Roundtable on Spoken Dialogue Systems. 2025

2025
[10]

Knowledge Graphs and Representational Models for Dialogue Systems

Walker, Nicholas Thomas. Knowledge Graphs and Representational Models for Dialogue Systems. Proceedings of the 21st Workshop of Young Researchers' Roundtable on Spoken Dialogue Systems. 2025

2025
[11]

Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.0

work page doi:10.18653/v1/2025.xllm-1.0 2025
[12]

Fine-Tuning Large Language Models for Relation Extraction within a Retrieval-Augmented Generation Framework

Efeoglu, Sefika and Paschke, Adrian. Fine-Tuning Large Language Models for Relation Extraction within a Retrieval-Augmented Generation Framework. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.1

work page doi:10.18653/v1/2025.xllm-1.1 2025
[13]

Benchmarking Table Extraction: Multimodal LLM s vs Traditional OCR

Nunes, Guilherme and Rolla, Vitor and Pereira, Duarte and Alves, Vasco and Carreiro, Andre and Baptista, M \'a rcia. Benchmarking Table Extraction: Multimodal LLM s vs Traditional OCR. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.2

work page doi:10.18653/v1/2025.xllm-1.2 2025
[14]

Injecting Structured Knowledge into LLM s via Graph Neural Networks

Li, Zichao and Ke, Zong and Zhao, Puning. Injecting Structured Knowledge into LLM s via Graph Neural Networks. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.3

work page doi:10.18653/v1/2025.xllm-1.3 2025
[15]

Regular-pattern-sensitive CRF s for Distant Label Interactions

Papay, Sean and Klinger, Roman and Pad \'o , Sebastian. Regular-pattern-sensitive CRF s for Distant Label Interactions. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.4

work page doi:10.18653/v1/2025.xllm-1.4 2025
[16]

From Syntax to Semantics: Evaluating the Impact of Linguistic Structures on LLM -Based Information Extraction

Swarup, Anushka and Bhandarkar, Avanti and Wilson, Ronald and Pan, Tianyu and Woodard, Damon. From Syntax to Semantics: Evaluating the Impact of Linguistic Structures on LLM -Based Information Extraction. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.5

work page doi:10.18653/v1/2025.xllm-1.5 2025
[17]

Detecting Referring Expressions in Visually Grounded Dialogue with Autoregressive Language Models

Willemsen, Bram and Skantze, Gabriel. Detecting Referring Expressions in Visually Grounded Dialogue with Autoregressive Language Models. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.6

work page doi:10.18653/v1/2025.xllm-1.6 2025
[18]

Exploring Multilingual Probing in Large Language Models: A Cross-Language Analysis

Li, Daoyang and Zhao, Haiyan and Zeng, Qingcheng and Du, Mengnan. Exploring Multilingual Probing in Large Language Models: A Cross-Language Analysis. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.7

work page doi:10.18653/v1/2025.xllm-1.7 2025
[19]

Self-Contrastive Loop of Thought Method for Text-to- SQL Based on Large Language Model

Kang, Fengrui and Tan, Mingxi and Huang, Xianying and Yang, Shiju. Self-Contrastive Loop of Thought Method for Text-to- SQL Based on Large Language Model. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.8

work page doi:10.18653/v1/2025.xllm-1.8 2025
[20]

Combining Automated and Manual Data for Effective Downstream Fine-Tuning of Transformers for Low-Resource Language Applications

Isaeva, Ulyana and Astafurov, Danil and Martynov, Nikita. Combining Automated and Manual Data for Effective Downstream Fine-Tuning of Transformers for Low-Resource Language Applications. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.9

work page doi:10.18653/v1/2025.xllm-1.9 2025
[21]

Seamlessly Integrating Tree-Based Positional Embeddings into Transformer Models for Source Code Representation

Bartkowiak, Patryk and Grali \'n ski, Filip. Seamlessly Integrating Tree-Based Positional Embeddings into Transformer Models for Source Code Representation. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.10

work page doi:10.18653/v1/2025.xllm-1.10 2025
[22]

Enhancing AMR Parsing with Group Relative Policy Optimization

Barta, Botond and Hamerlik, Endre and Nyist, Mil \'a n and Ito, Masato and Acs, Judit. Enhancing AMR Parsing with Group Relative Policy Optimization. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.11

work page doi:10.18653/v1/2025.xllm-1.11 2025
[23]

Structure Modeling Approach for UD Parsing of Historical M odern J apanese

Ozaki, Hiroaki and Omura, Mai and Komiya, Kanako and Asahara, Masayuki and Ogiso, Toshinobu. Structure Modeling Approach for UD Parsing of Historical M odern J apanese. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.12

work page doi:10.18653/v1/2025.xllm-1.12 2025
[24]

BARTABSA ++: Revisiting BARTABSA with Decoder LLM s

Pfister, Jan and V. BARTABSA ++: Revisiting BARTABSA with Decoder LLM s. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.13

work page doi:10.18653/v1/2025.xllm-1.13 2025
[25]

Typed- RAG : Type-Aware Decomposition of Non-Factoid Questions for Retrieval-Augmented Generation

Lee, DongGeon and Park, Ahjeong and Lee, Hyeri and Nam, Hyeonseo and Maeng, Yunho. Typed- RAG : Type-Aware Decomposition of Non-Factoid Questions for Retrieval-Augmented Generation. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.14

work page doi:10.18653/v1/2025.xllm-1.14 2025
[26]

Do we still need Human Annotators? Prompting Large Language Models for Aspect Sentiment Quad Prediction

Hellwig, Nils Constantin and Fehle, Jakob and Kruschwitz, Udo and Wolff, Christian. Do we still need Human Annotators? Prompting Large Language Models for Aspect Sentiment Quad Prediction. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.15

work page doi:10.18653/v1/2025.xllm-1.15 2025
[27]

Can LLM s Interpret and Leverage Structured Linguistic Representations? A Case Study with AMR s

Raut, Ankush and Zhu, Xiaofeng and Pacheco, Maria Leonor. Can LLM s Interpret and Leverage Structured Linguistic Representations? A Case Study with AMR s. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.16

work page internal anchor Pith review doi:10.18653/v1/2025.xllm-1.16 2025
[28]

LLM Dependency Parsing with In-Context Rules

Ginn, Michael and Palmer, Alexis. LLM Dependency Parsing with In-Context Rules. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.17

work page doi:10.18653/v1/2025.xllm-1.17 2025
[29]

Cognitive Mirroring for D oc RE : A Self-Supervised Iterative Reflection Framework with Triplet-Centric Explicit and Implicit Feedback

Han, Xu and Wang, Bo and Sun, Yueheng and Zhao, Dongming and Qu, Zongfeng and He, Ruifang and Hou, Yuexian and Hu, Qinghua. Cognitive Mirroring for D oc RE : A Self-Supervised Iterative Reflection Framework with Triplet-Centric Explicit and Implicit Feedback. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025)...

work page doi:10.18653/v1/2025.xllm-1.18 2025
[30]

Cross-Document Event-Keyed Summarization

Walden, William and Kuchmiichuk, Pavlo and Martin, Alexander and Jin, Chihsheng and Cao, Angela and Sun, Claire and Allen, Curisia and White, Aaron Steven. Cross-Document Event-Keyed Summarization. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.19

work page doi:10.18653/v1/2025.xllm-1.19 2025
[31]

Transfer of Structural Knowledge from Synthetic Languages

Budnikov, Mikhail and Yamshchikov, Ivan. Transfer of Structural Knowledge from Synthetic Languages. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.20

work page doi:10.18653/v1/2025.xllm-1.20 2025
[32]

Language Models are Universal Embedders

Zhang, Xin and Li, Zehan and Zhang, Yanzhao and Long, Dingkun and Xie, Pengjun and Zhang, Meishan and Zhang, Min. Language Models are Universal Embedders. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.21

work page doi:10.18653/v1/2025.xllm-1.21 2025
[33]

D ia DP @ XLLM 25: Advancing C hinese Dialogue Parsing via Unified Pretrained Language Models and Biaffine Dependency Scoring

Duan, Shuoqiu and Chen, Xiaoliang and Miao, Duoqian and Gu, Xu and Li, Xianyong and Du, Yajun. D ia DP @ XLLM 25: Advancing C hinese Dialogue Parsing via Unified Pretrained Language Models and Biaffine Dependency Scoring. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.22

work page doi:10.18653/v1/2025.xllm-1.22 2025
[34]

LLMSR @ XLLM 25: Less is More: Enhancing Structured Multi-Agent Reasoning via Quality-Guided Distillation

Yuan, Jiahao and Sun, Xingzhe and Yu, Xing and Wang, Jingwen and Du, Dehui and Cui, Zhiqing and Di, Zixiang. LLMSR @ XLLM 25: Less is More: Enhancing Structured Multi-Agent Reasoning via Quality-Guided Distillation. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.23

work page doi:10.18653/v1/2025.xllm-1.23 2025
[35]

S peech EE @ XLLM 25: End-to-End Structured Event Extraction from Speech

Chaudhuri, Soham and Biswas, Diganta and Saha, Dipanjan and Das, Dipankar and Bandyopadhyay, Sivaji. S peech EE @ XLLM 25: End-to-End Structured Event Extraction from Speech. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.24

work page doi:10.18653/v1/2025.xllm-1.24 2025
[36]

Luu, Son and Van Nguyen, Kiet

Pham Hoang Le, Nguyen and Dinh Thien, An and T. Luu, Son and Van Nguyen, Kiet. D oc IE @ XLLM 25: Z ero S emble - Robust and Efficient Zero-Shot Document Information Extraction with Heterogeneous Large Language Model Ensembles. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.25

work page doi:10.18653/v1/2025.xllm-1.25 2025
[37]

D oc IE @ XLLM 25: In-Context Learning for Information Extraction using Fully Synthetic Demonstrations

Popovic, Nicholas and Kangen, Ashish and Schopf, Tim and F. D oc IE @ XLLM 25: In-Context Learning for Information Extraction using Fully Synthetic Demonstrations. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.26

work page doi:10.18653/v1/2025.xllm-1.26 2025
[38]

LLMSR @ XLLM 25: Integrating Reasoning Prompt Strategies with Structural Prompt Formats for Enhanced Logical Inference

Tai, Le and Van, Thin. LLMSR @ XLLM 25: Integrating Reasoning Prompt Strategies with Structural Prompt Formats for Enhanced Logical Inference. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.27

work page doi:10.18653/v1/2025.xllm-1.27 2025
[39]

D oc IE @ XLLM 25: UIEP rompter: A Unified Training-Free Framework for universal document-level information extraction via Structured Prompt

Qiu, Chengfeng and Zhou, Lifeng and Wei, Kaifeng and Li, Yuke. D oc IE @ XLLM 25: UIEP rompter: A Unified Training-Free Framework for universal document-level information extraction via Structured Prompt. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.28

work page doi:10.18653/v1/2025.xllm-1.28 2025
[40]

LLMSR @ XLLM 25: SWRV : Empowering Self-Verification of Small Language Models through Step-wise Reasoning and Verification

Chen, Danchun. LLMSR @ XLLM 25: SWRV : Empowering Self-Verification of Small Language Models through Step-wise Reasoning and Verification. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.29

work page doi:10.18653/v1/2025.xllm-1.29 2025
[41]

LLMSR @ XLLM 25: An Empirical Study of LLM for Structural Reasoning

Li, Xinye and Wan, Mingqi and Sui, Dianbo. LLMSR @ XLLM 25: An Empirical Study of LLM for Structural Reasoning. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.30

work page doi:10.18653/v1/2025.xllm-1.30 2025
[42]

LLMSR @ XLLM 25: A Language Model-Based Pipeline for Structured Reasoning Data Construction

Xing, Hongrui and Liu, Xinzhang and Jiang, Zhuo and Yang, Zhihao and Yao, Yitong and Wang, Zihan and Deng, Wenmin and Wang, Chao and Song, Shuangyong and Yang, Wang and He, Zhongjiang and Li, Yongxiang. LLMSR @ XLLM 25: A Language Model-Based Pipeline for Structured Reasoning Data Construction. Proceedings of the 1st Joint Workshop on Large Language Model...

work page doi:10.18653/v1/2025.xllm-1.31 2025
[43]

S peech EE @ XLLM 25: Retrieval-Enhanced Few-Shot Prompting for Speech Event Extraction

Gedeon, M \'a t \'e. S peech EE @ XLLM 25: Retrieval-Enhanced Few-Shot Prompting for Speech Event Extraction. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.32

work page doi:10.18653/v1/2025.xllm-1.32 2025
[44]

Computational Sanskrit and Digital Humanities - World Sanskrit Conference 2025. 2025

2025
[45]

An introduction to computational identification and classification of Upam \= a alaṇk \= a ra

Jadhav, Bhakti and Dutta, Himanshu and Kanitkar, Shruti and Kulkarni, Malhar and Bhattacharyya, Pushpak. An introduction to computational identification and classification of Upam \= a alaṇk \= a ra. Computational Sanskrit and Digital Humanities - World Sanskrit Conference 2025. 2025

2025
[46]

Aesthetics of S anskrit Poetry from the Perspective of Computational Linguistics: A Case Study Analysis on \'S ikṣ \= a ṣṭaka

Sandhan, Jivnesh and Barbadikar, Amruta and Maity, Malay and Satuluri, Pavankumar and Sandhan, Tushar and Gupta, Ravi M and Goyal, Pawan and Behera, Laxmidhar. Aesthetics of S anskrit Poetry from the Perspective of Computational Linguistics: A Case Study Analysis on \'S ikṣ \= a ṣṭaka. Computational Sanskrit and Digital Humanities - World Sanskrit Confere...

2025
[47]

Itaretara Dvandva: A challenge for Dependency Tree semantics

Kulkarni, Amba and Neelamana, Vasudha. Itaretara Dvandva: A challenge for Dependency Tree semantics. Computational Sanskrit and Digital Humanities - World Sanskrit Conference 2025. 2025

2025
[48]

A Case Study of Handwritten Text Recognition from Pre-Colonial era S anskrit Manuscripts

Chincholikar, Kartik and Dwivedi, Shagun and Gopalan, Kaushik and Awasthi, Tarinee. A Case Study of Handwritten Text Recognition from Pre-Colonial era S anskrit Manuscripts. Computational Sanskrit and Digital Humanities - World Sanskrit Conference 2025. 2025

2025
[49]

Towards Accent-Aware V edic S anskrit Optical Character Recognition Based on Transformer Models

Tsukagoshi, Yuzuki and Kuroiwa, Ryo and Ohmukai, Ikki. Towards Accent-Aware V edic S anskrit Optical Character Recognition Based on Transformer Models. Computational Sanskrit and Digital Humanities - World Sanskrit Conference 2025. 2025

2025
[50]

Vedavani: A Benchmark Corpus for ASR on V edic S anskrit Poetry

Kumar, Sujeet and Ray, Pretam and Beerukuri, Abhinay and Kamoji, Shrey and Jagadeeshan, Manoj Balaji and Goyal, Pawan. Vedavani: A Benchmark Corpus for ASR on V edic S anskrit Poetry. Computational Sanskrit and Digital Humanities - World Sanskrit Conference 2025. 2025

2025
[51]

Compound Type Identification in S anskrit

Krishnan, Sriram and Satuluri, Pavankumar and Barbadikar, Amruta and Prasanna Venkatesh, T S and Kulkarni, Amba. Compound Type Identification in S anskrit. Computational Sanskrit and Digital Humanities - World Sanskrit Conference 2025. 2025

2025
[52]

IKML : A Markup Language for Collaborative Semantic Annotation of I ndic Texts

Lakkundi, Chaitanya S and Rajaraman, Gopalakrishnan and Susarla, Sai Rama Krishna. IKML : A Markup Language for Collaborative Semantic Annotation of I ndic Texts. Computational Sanskrit and Digital Humanities - World Sanskrit Conference 2025. 2025

2025
[53]

Challenges in Processing V edic S anskrit: Towards creating a normalized dataset for the Ṛgveda-saṃhit \= a

Krishnan, Sriram and Gayathri, Sepuri and Kulkarni, Amba. Challenges in Processing V edic S anskrit: Towards creating a normalized dataset for the Ṛgveda-saṃhit \= a. Computational Sanskrit and Digital Humanities - World Sanskrit Conference 2025. 2025

2025
[54]

P \= a ṇḍitya: Visualizing S anskrit Intellectual Networks

Neill, Tyler. P \= a ṇḍitya: Visualizing S anskrit Intellectual Networks. Computational Sanskrit and Digital Humanities - World Sanskrit Conference 2025. 2025

2025
[55]

Anveshana: A New Benchmark Dataset for Cross-Lingual Information Retrieval on E nglish Queries and S anskrit Documents

Jagadeeshan, Manoj Balaji and Raj, Prince and Goyal, Pawan. Anveshana: A New Benchmark Dataset for Cross-Lingual Information Retrieval on E nglish Queries and S anskrit Documents. Computational Sanskrit and Digital Humanities - World Sanskrit Conference 2025. 2025

2025
[56]

Concordance of S anskrit Synonyms

Patel, Dhaval. Concordance of S anskrit Synonyms. Computational Sanskrit and Digital Humanities - World Sanskrit Conference 2025. 2025

2025
[57]

Proceedings of the First Workshop on Writing Aids at the Crossroads of AI, Cognitive Science and NLP (WRAICOGS 2025). 2025

2025
[58]

Chain-of- M eta W riting: Linguistic and Textual Analysis of How Small Language Models Write Young Students Texts

Buhnila, Ioana and Cislaru, Georgeta and Todirascu, Amalia. Chain-of- M eta W riting: Linguistic and Textual Analysis of How Small Language Models Write Young Students Texts. Proceedings of the First Workshop on Writing Aids at the Crossroads of AI, Cognitive Science and NLP (WRAICOGS 2025). 2025

2025
[59]

Semantic Masking in a Needle-in-a-haystack Test for Evaluating Large Language Model Long-Text Capabilities

Shi, Ken and Penn, Gerald. Semantic Masking in a Needle-in-a-haystack Test for Evaluating Large Language Model Long-Text Capabilities. Proceedings of the First Workshop on Writing Aids at the Crossroads of AI, Cognitive Science and NLP (WRAICOGS 2025). 2025

2025
[60]

Reading Between the Lines: A dataset and a study on why some texts are tougher than others

Khallaf, Nouran and Eugeni, Carlo and Sharoff, Serge. Reading Between the Lines: A dataset and a study on why some texts are tougher than others. Proceedings of the First Workshop on Writing Aids at the Crossroads of AI, Cognitive Science and NLP (WRAICOGS 2025). 2025

2025
[61]

P ara R ev : Building a dataset for Scientific Paragraph Revision annotated with revision instruction

Jourdan, L \'e ane and Boudin, Florian and Dufour, Richard and Hernandez, Nicolas and Aizawa, Akiko. P ara R ev : Building a dataset for Scientific Paragraph Revision annotated with revision instruction. Proceedings of the First Workshop on Writing Aids at the Crossroads of AI, Cognitive Science and NLP (WRAICOGS 2025). 2025

2025
[62]

Towards an operative definition of creative writing: a preliminary assessment of creativeness in AI and human texts

Maggi, Chiara and Vitaletti, Andrea. Towards an operative definition of creative writing: a preliminary assessment of creativeness in AI and human texts. Proceedings of the First Workshop on Writing Aids at the Crossroads of AI, Cognitive Science and NLP (WRAICOGS 2025). 2025

2025
[63]

Decoding Semantic Representations in the Brain Under Language Stimuli with Large Language Models

Sato, Anna and Kobayashi, Ichiro. Decoding Semantic Representations in the Brain Under Language Stimuli with Large Language Models. Proceedings of the First Workshop on Writing Aids at the Crossroads of AI, Cognitive Science and NLP (WRAICOGS 2025). 2025

2025
[64]

Proceedings of the 5th Wordplay: When Language meets Games Workshop (Wordplay 2025). 2025. doi:10.18653/v1/2025.wordplay-1.0

work page doi:10.18653/v1/2025.wordplay-1.0 2025
[65]

Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH). 2025

2025
[66]

A Comprehensive Taxonomy of Bias Mitigation Methods for Hate Speech Detection

Fillies, Jan and Wawerek, Marius and Paschke, Adrian. A Comprehensive Taxonomy of Bias Mitigation Methods for Hate Speech Detection. Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH). 2025

2025
[67]

Sensitive Content Classification in Social Media: A Holistic Resource and Evaluation

Antypas, Dimosthenis and Sen, Indira and Perez Almendros, Carla and Camacho-Collados, Jose and Barbieri, Francesco. Sensitive Content Classification in Social Media: A Holistic Resource and Evaluation. Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH). 2025

2025
[68]

From civility to parity: Marxist-feminist ethics for context-aware algorithmic content moderation

Oh, Dayei. From civility to parity: Marxist-feminist ethics for context-aware algorithmic content moderation. Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH). 2025

2025
[69]

A Novel Dataset for Classifying G erman Hate Speech Comments with Criminal Relevance

Kums, Vincent and Meyer, Florian and Pivit, Luisa and Vedenina, Uliana and Wortmann, Jonas and Siegel, Melanie and Labudde, Dirk. A Novel Dataset for Classifying G erman Hate Speech Comments with Criminal Relevance. Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH). 2025

2025
[70]

Learning from Disagreement: Entropy-Guided Few-Shot Selection for Toxic Language Detection

Caselli, Tommaso and Plaza-del-Arco, Flor Miriam. Learning from Disagreement: Entropy-Guided Few-Shot Selection for Toxic Language Detection. Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH). 2025

2025
[71]

Debiasing Static Embeddings for Hate Speech Detection

Sun, Ling and Kim, Soyoung and Dong, Xiao and K. Debiasing Static Embeddings for Hate Speech Detection. Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH). 2025

2025
[72]

Web(er) of Hate: A Survey on How Hate Speech Is Typed

Wang, Luna and Caines, Andrew and Hutchings, Alice. Web(er) of Hate: A Survey on How Hate Speech Is Typed. Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH). 2025

2025
[73]

Think Like a Person Before Responding: A Multi-Faceted Evaluation of Persona-Guided LLM s for Countering Hate Speech

Ngueajio, Mikel and Plaza-del-Arco, Flor Miriam and Chung, Yi-Ling and Rawat, Danda and Cercas Curry, Amanda. Think Like a Person Before Responding: A Multi-Faceted Evaluation of Persona-Guided LLM s for Countering Hate Speech. Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH). 2025

2025
[74]

HODIAT : A Dataset for Detecting Homotransphobic Hate Speech in I talian with Aggressiveness and Target Annotation

Damo, Greta and Cignarella, Alessandra Teresa and Caselli, Tommaso and Patti, Viviana and Nozza, Debora. HODIAT : A Dataset for Detecting Homotransphobic Hate Speech in I talian with Aggressiveness and Target Annotation. Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH). 2025

2025
[75]

Beyond the Binary: Analysing Transphobic Hate and Harassment Online

Talas, Anna and Hutchings, Alice. Beyond the Binary: Analysing Transphobic Hate and Harassment Online. Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH). 2025

2025
[76]

Evading Toxicity Detection with ASCII -art: A Benchmark of Spatial Attacks on Moderation Systems

Berezin, Sergey and Farahbakhsh, Reza and Crespi, Noel. Evading Toxicity Detection with ASCII -art: A Benchmark of Spatial Attacks on Moderation Systems. Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH). 2025

2025
[77]

Debunking with Dialogue? Exploring AI -Generated Counterspeech to Challenge Conspiracy Theories

Lisker, Mareike and Gottschalk, Christina and Mihaljevi \'c , Helena. Debunking with Dialogue? Exploring AI -Generated Counterspeech to Challenge Conspiracy Theories. Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH). 2025

2025
[78]

M isinfo T ele G raph: Network-driven Misinformation Detection for G erman Telegram Messages

Kalkbrenner, Lu and Solopova, Veronika and Zeiler, Steffen and Nickel, Robert and Kolossa, Dorothea. M isinfo T ele G raph: Network-driven Misinformation Detection for G erman Telegram Messages. Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH). 2025

2025
[79]

Catching Stray Balls: Football, fandom, and the impact on digital discourse

Hill, Mark. Catching Stray Balls: Football, fandom, and the impact on digital discourse. Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH). 2025

2025
[80]

e , Justina and Rimkien \

Mandravickait \. e , Justina and Rimkien \. e , Egl \. e and Petkevi c ius, Mindaugas and Songailait \. e , Milita and Zaranka, Eimantas and Krilavi c ius, Tomas. Exploring Hate Speech Detection Models for L ithuanian Language. Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH). 2025

2025

Showing first 80 references.