Document-level machine translation followed by segment-level LLM refinement provides the strongest and most stable improvements in literary translation quality, mainly enhancing fluency and style rather than adequacy.
Gonzalez, Clark Barrett, and Ying Sheng
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 6roles
background 2polarities
background 2representative citing papers
SOB benchmark shows LLMs achieve near-perfect schema compliance but value accuracy of only 83% on text, 67% on images, and 24% on audio.
AutoLLMResearch trains agents in a multi-fidelity LLMConfig-Gym environment formulated as a long-horizon MDP to enable cross-fidelity extrapolation for automating high-cost LLM experiment configurations.
Excessive SFT reduces LLM plasticity for RL; Rejuvenation restores it via base-anchored fusion and targeted neuron resets, yielding better RL performance and OOD generalization.
MarginGate triggers verification only on low-margin decode steps to achieve 100% deterministic batch inference at 15-50% of the cost of always-on verification across tested models and datasets.
CacheWeaver is a lightweight scheduling layer that orders evidence to exploit prefix caching, reducing median TTFT by 20-33% across vLLM setups while preserving answer quality.
citing papers explorer
-
What Does LLM Refinement Actually Improve? A Systematic Study on Document-Level Literary Translation
Document-level machine translation followed by segment-level LLM refinement provides the strongest and most stable improvements in literary translation quality, mainly enhancing fluency and style rather than adequacy.
-
The Structured Output Benchmark: A Multi-Source Benchmark for Evaluating Structured Output Quality in Large Language Models
SOB benchmark shows LLMs achieve near-perfect schema compliance but value accuracy of only 83% on text, 67% on images, and 24% on audio.
-
AutoLLMResearch: Training Research Agents for Automating LLM Experiment Configuration - Learning from Cheap, Optimizing Expensive
AutoLLMResearch trains agents in a multi-fidelity LLMConfig-Gym environment formulated as a long-horizon MDP to enable cross-fidelity extrapolation for automating high-cost LLM experiment configurations.
-
When RL Fails after SFT: Rejuvenating Model Plasticity for Robust SFT-to-RL Handoff
Excessive SFT reduces LLM plasticity for RL; Rejuvenation restores it via base-anchored fusion and targeted neuron resets, yielding better RL performance and OOD generalization.
-
MarginGate: Sparse Margin-Triggered Verification for Batch-Invariant LLM Inference
MarginGate triggers verification only on low-margin decode steps to achieve 100% deterministic batch inference at 15-50% of the cost of always-on verification across tested models and datasets.
-
CacheWeaver: Cache-Aware Evidence Ordering for Efficient Grounded RAG Inference
CacheWeaver is a lightweight scheduling layer that orders evidence to exploit prefix caching, reducing median TTFT by 20-33% across vLLM setups while preserving answer quality.