The paper characterizes deductive stereotyping in LLMs and introduces Fair-GCG to discover injection phrases that improve fairness across benchmarks, reasoning, and real-world tasks.
T as T e: Teaching Large Language Models to Translate through Self-Reflection
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CL 3years
2026 3verdicts
UNVERDICTED 3representative citing papers
Introduces ShopTrajQA long-context benchmark and an RLVR-trained tool-augmented agent that bypasses LLM context limits by external file storage and code-based retrieval for shopping trajectories.
MADE is a new multilingual agentic diagnosing engine that produces higher-quality diagnostic reports (47% better than baseline) on a large-scale evaluation substrate covering 33 model families and 26 languages.
citing papers explorer
-
Wait, am I Being Fair? Characterizing Deductive Stereotyping and Mitigating It with Fair-GCG
The paper characterizes deductive stereotyping in LLMs and introduces Fair-GCG to discover injection phrases that improve fairness across benchmarks, reasoning, and real-world tasks.
-
Customer-Agent: Overcoming Context Limitations in Ultra-Long Shopping Trajectories via Tool-Augmented Agents and RLVR
Introduces ShopTrajQA long-context benchmark and an RLVR-trained tool-augmented agent that bypasses LLM context limits by external file storage and code-based retrieval for shopping trajectories.
-
MADE: Beyond Scoring via a Multilingual Agentic Diagnosing Engine for Fine-Grained Evaluation Insights
MADE is a new multilingual agentic diagnosing engine that produces higher-quality diagnostic reports (47% better than baseline) on a large-scale evaluation substrate covering 33 model families and 26 languages.