A matched benchmark shows GUI computer-use agents at 59.1% full pass rate versus 48.2% for original-skill CLI agents, rising to 69.3% with verifier-guided augmentation, indicating modality-specific execution bottlenecks.
hub
Transfer of Structural Knowledge from Synthetic Languages
11 Pith papers cite this work. Polarity classification is still indexing.
hub tools
years
2026 11verdicts
UNVERDICTED 11representative citing papers
Reward models for LLMs frequently select socially undesirable options across four social domains, show no overall best performer, and exhibit a bias-avoidance versus context-sensitivity trade-off.
LLMs generate Xiaohongshu-style posts that elicit social comparison but show stable failures in prompt-based detection of the same reader-grounded signal.
Lexical richness is a robust linguistic signal for AI-generated text detection across models and domains, while most other features are context-dependent.
Cross-lingual transfer and language-specific data efforts are interdependent and complementary for effective low-resource NLP, as demonstrated through Luxembourgish case studies and synthesis.
LLM-generated ML pipelines show higher bias (87.7% sensitive attributes) than conditional statements (59.2%), indicating that simple if-statement tests underestimate bias risk in practical code generation.
Introduces LLM Consumer Behavior Theory to analyze consumer behavior when LLMs serve as autonomous decision-making agents in markets.
A feature-based decision tree with parsing-derived signals and heuristics detects LLM-generated code in a lightweight, CPU-only setup for SemEval-2026 Task 13.
Finetuning Qwen3-32B with data augmentation and self-training achieves competitive 8th-place ranking on SemEval-2026 conspiracy detection.
Finetuning LLMs with QLoRA and multilingual data augmentation for polarization detection, type, and manifestation in SemEval-2026 Task 9.
Fine-tuning LLMs by adapting the mdok approach produces competitive results on binary detection, source attribution, and hybrid/adversarial code identification in SemEval-2026 Task 13.
citing papers explorer
-
GUI vs. CLI: Execution Bottlenecks in Screen-Only and Skill-Mediated Computer-Use Agents
A matched benchmark shows GUI computer-use agents at 59.1% full pass rate versus 48.2% for original-skill CLI agents, rising to 69.3% with verifier-guided augmentation, indicating modality-specific execution bottlenecks.
-
LLM Consumer Behavior Theory: Foundations of a Novel Research Field
Introduces LLM Consumer Behavior Theory to analyze consumer behavior when LLMs serve as autonomous decision-making agents in markets.