Automatic evaluation tools for literary translations correlate poorly with expert human judgments on creativity and exhibit bias favoring machine-translated texts.
Findings of the
9 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CL 9years
2026 9representative citing papers
Document-level machine translation followed by segment-level LLM refinement provides the strongest and most stable improvements in literary translation quality, mainly enhancing fluency and style rather than adequacy.
dGRPO merges outcome-based policy optimization with dense teacher guidance from on-policy distillation, yielding more stable long-context reasoning on the new LongBlocks synthetic dataset.
Dynamic Meta-Metrics learns source-sentence-conditioned combinations of MT metrics, with MLP-based hard and soft clustering versions outperforming static linear and Gaussian process ensembles on WMT data.
Reward models for LLMs frequently select socially undesirable options across four social domains, show no overall best performer, and exhibit a bias-avoidance versus context-sensitivity trade-off.
LLMs generate social media posts that shift readers' perceived standing and comparison-related feelings but fail to reliably detect the same social-comparison triggers via prompt-based classification.
Combining dictionary glosses with Universal Dependencies syntactic information in in-context learning produces new state-of-the-art Coptic-English translation results across model sizes.
Cross-lingual transfer and language-specific data efforts are interdependent and complementary for effective low-resource NLP, as demonstrated through Luxembourgish case studies and synthesis.
A feature-based decision tree with parsing-derived signals and heuristics detects LLM-generated code in a lightweight, CPU-only setup for SemEval-2026 Task 13.
citing papers explorer
-
Creativity Bias: How Machine Evaluation Struggles with Creativity in Literary Translations
Automatic evaluation tools for literary translations correlate poorly with expert human judgments on creativity and exhibit bias favoring machine-translated texts.
-
What Does LLM Refinement Actually Improve? A Systematic Study on Document-Level Literary Translation
Document-level machine translation followed by segment-level LLM refinement provides the strongest and most stable improvements in literary translation quality, mainly enhancing fluency and style rather than adequacy.
-
Combining On-Policy Optimization and Distillation for Long-Context Reasoning in Large Language Models
dGRPO merges outcome-based policy optimization with dense teacher guidance from on-policy distillation, yielding more stable long-context reasoning on the new LongBlocks synthetic dataset.
-
Dynamic Meta-Metrics: Source-Sentence Conditioned Weighting for MT Evaluation
Dynamic Meta-Metrics learns source-sentence-conditioned combinations of MT metrics, with MLP-based hard and soft clustering versions outperforming static linear and Gaussian process ensembles on WMT data.
-
Misaligned by Reward: Socially Undesirable Preferences in LLMs
Reward models for LLMs frequently select socially undesirable options across four social domains, show no overall best performer, and exhibit a bias-avoidance versus context-sensitivity trade-off.
-
Psychologically Potent, Computationally Invisible: LLMs Generate Social-Comparison Triggers They Fail to Detect
LLMs generate social media posts that shift readers' perceived standing and comparison-related feelings but fail to reliably detect the same social-comparison triggers via prompt-based classification.
-
Syntax as a Rosetta Stone: Universal Dependencies for In-Context Coptic Translation
Combining dictionary glosses with Universal Dependencies syntactic information in in-context learning produces new state-of-the-art Coptic-English translation results across model sizes.
-
Why Low-Resource NLP Needs More Than Cross-Lingual Transfer: Lessons Learned from Luxembourgish
Cross-lingual transfer and language-specific data efforts are interdependent and complementary for effective low-resource NLP, as demonstrated through Luxembourgish case studies and synthesis.
-
FMI_SU_Yotkova_Kastreva at SemEval-2026 Task 13: Lightweight Detection of LLM-Generated Code via Stylometric Signals
A feature-based decision tree with parsing-derived signals and heuristics detects LLM-generated code in a lightweight, CPU-only setup for SemEval-2026 Task 13.