MultiSynt/MT supplies 4.8 trillion translated tokens in 36 languages from 100B English tokens, letting LLMs match native-data baselines with 72% fewer tokens and beat them by 15% at equal budget.
How Human is Machine Translationese? Comparing Human and Machine Translations of Text and Speech
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
COMPASS uses semantic clustering on multilingual embeddings to select auxiliary data for PEFT adapters, outperforming linguistic-similarity baselines on multilingual benchmarks while supporting continual adaptation.
LLM translations introduce model-specific statistically significant emotional fingerprints that limit preservation of author voice, with post-editing providing partial alignment to human norms.
citing papers explorer
-
MultiSynt/MT: Trillion-Token Multi-Parallel Pre-Training Data Translated Across 36 Languages
MultiSynt/MT supplies 4.8 trillion translated tokens in 36 languages from 100B English tokens, letting LLMs match native-data baselines with 72% fewer tokens and beat them by 15% at equal budget.
-
Emotion Profiling in LLM-Based Literary Translation: Systematic Shifts Across MT and Post-Editing
LLM translations introduce model-specific statistically significant emotional fingerprints that limit preservation of author voice, with post-editing providing partial alignment to human norms.