RLSR trains source rewriters via RL with translation-quality improvement as the reward, outperforming prompt baselines at 4B scale while matching larger models.
hub
Findings of the WMT 25 general machine translation shared task: Time to stop evaluating on easy test sets
19 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
years
2026 19roles
other 1polarities
unclear 1representative citing papers
Ouvia is a user-centered evaluation framework for speech translation usability in real-world scenarios, showing limited usability rates and the superiority of QA-based metrics.
Automatic evaluation tools for literary translations correlate poorly with expert human judgments on creativity and exhibit bias favoring machine-translated texts.
Document-level machine translation followed by segment-level LLM refinement provides the strongest and most stable improvements in literary translation quality, mainly enhancing fluency and style rather than adequacy.
Human readers prefer human literary translations over AI-generated ones for immersion and clarity despite finding MT adequate and struggling to identify the source.
Reward models for LLMs frequently select socially undesirable options across four social domains, show no overall best performer, and exhibit a bias-avoidance versus context-sensitivity trade-off.
Empirical study finds verbalized per-token confidence methods in LLMs for MT perform similarly to internal signals on error detection and calibration but show little correlation.
Multi-aspect iterative refinement with specialized LLMs generates superior literary translation data, enabling SFT and GRPO to produce LitMT-8B and LitMT-14B models scoring 67.25 and 69.07 CEA100 on MetaphorTrans, competitive with Claude Sonnet 4.5.
Lexical richness is a robust linguistic signal for AI-generated text detection across models and domains, while most other features are context-dependent.
Cross-lingual transfer and language-specific data efforts are interdependent and complementary for effective low-resource NLP, as demonstrated through Luxembourgish case studies and synthesis.
Compact 0.8B-7B models for bidirectional Japanese-English translation outperform large multilingual models on real-world domain benchmarks.
Introduces LLM Consumer Behavior Theory to analyze consumer behavior when LLMs serve as autonomous decision-making agents in markets.
A cascaded SimulST system using Parakeet and Qwen 3.5 with adaptive black-box policies and RAG context achieves +5.82 XCOMET-XL improvement on En→De for IWSLT 2026.
A feature-based decision tree with parsing-derived signals and heuristics detects LLM-generated code in a lightweight, CPU-only setup for SemEval-2026 Task 13.
citing papers explorer
-
LLM Consumer Behavior Theory: Foundations of a Novel Research Field
Introduces LLM Consumer Behavior Theory to analyze consumer behavior when LLMs serve as autonomous decision-making agents in markets.