pith. machine review for the scientific record. sign in

arxiv: 2604.18106 · v1 · submitted 2026-04-20 · 💻 cs.CL

Recognition: unknown

Efficient Low-Resource Language Adaptation via Multi-Source Dynamic Logit Fusion

Authors on Pith no claims yet

Pith reviewed 2026-05-10 05:24 UTC · model grok-4.3

classification 💻 cs.CL
keywords low-resource languageslogit fusionlanguage model adaptationtest-time adaptationmulti-source fusioncontinual pretrainingproxy tuning
0
0 comments X

The pith

TriMix dynamically fuses logits from three sources to adapt large language models to low-resource languages without task annotations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes TriMix as a test-time logit fusion method that balances three inputs: LRL competence from a continually pretrained small model, task knowledge from high-resource instruction tuning, and scaling effects from large models. This addresses the failure of Proxy Tuning in LRL settings, where large models' weak performance can dominate. The approach needs only small-model continual pretraining and no LRL task data, and experiments across four model families and eight languages show consistent gains over baselines. Analysis highlights that giving priority to the small LRL-specialized model's logits drives the improvement.

Core claim

TriMix is a test-time logit fusion framework that dynamically balances LRL competence from a continually pretrained small model, task competence from high-resource language instruction tuning, and the scaling benefits of large models; it requires no LRL task annotations and outperforms single-model baselines and Proxy Tuning across four model families and eight LRLs, with prioritizing the small LRL-specialized model's logits proving crucial.

What carries the argument

TriMix, a test-time logit fusion framework that dynamically weights and combines logits from a small continually pretrained LRL model, a high-resource tuned model, and a large model.

If this is right

  • TriMix enables efficient LRL adaptation using only continual pretraining on a small model and existing high-resource tuning data.
  • The method applies across multiple model families without requiring LRL-specific task annotations.
  • Prioritizing logits from the small LRL-specialized model is necessary for the fusion to succeed, contrary to large-model-dominant assumptions.
  • The framework remains data-efficient and compute-light at inference time.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Fusion weights could be learned or adapted per language or task without manual tuning, extending the dynamic aspect further.
  • Similar three-source balancing might transfer to other adaptation settings like domain shift or specialized skills where small experts complement large models.
  • Testing TriMix on even scarcer languages or with zero high-resource overlap would clarify the limits of the small-model prioritization.

Load-bearing premise

That the small model's LRL competence from continual pretraining can be dynamically balanced with the other sources without the large model's weak LRL performance overwhelming the fusion, and that this holds without any LRL task annotations.

What would settle it

An experiment on new LRLs where even optimized dynamic weighting in TriMix produces no gains over Proxy Tuning or single small-model baselines, or where forcing higher weight on the small model hurts rather than helps results.

Figures

Figures reproduced from arXiv: 2604.18106 by Chen Zhang, Jiuheng Lin, Yansong Feng, Zhiyuan Liao.

Figure 1
Figure 1. Figure 1: TRIMIX integrates three sources of benefit for LRL adaptation while minimizing the need for annotat￾ing task data and tuning larger models. et al., 2025; Hernández et al., 2025). Researchers working on LRLs often rely on smaller models and lack the resources required to continually pretrain large-scale models, even though larger models typ￾ically exhibit stronger general abilities. Recent studies have expl… view at source ↗
Figure 2
Figure 2. Figure 2: The framework of TRIMIX. Given a task prompt for an LRL, TRIMIX dynamically fuses the logits of three models to integrate language competence, task competence, and scaling benefits. HRL instruction-following data. • Continually Pretrained (cpt): A variant ob￾tained by performing CPT on the base model using LRL corpora to enhance target language competence. To minimize computational overhead, we per￾form CP… view at source ↗
Figure 3
Figure 3. Figure 3: Performance of TRIMIX (PPL) across differ￾ent combinations of model sizes. Scores are reported on MiLiC-Eval and averaged over four languages. Circle sizes represent the scale of training parameters. In practice, LRL researchers can start with the largest CPT model they can feasibly train and apply TRIMIX in combination with a larger instruction￾tuned model to achieve stable performance gains. In Appendix … view at source ↗
Figure 4
Figure 4. Figure 4: The average score across all the tasks on MiLiC-Eval for each pair of hyperparameters in T [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
read the original abstract

Adapting large language models (LLMs) to low-resource languages (LRLs) is constrained by the scarcity of task data and computational resources. Although Proxy Tuning offers a logit-level strategy for introducing scaling effects, it often fails in LRL settings because the large model's weak LRL competence might overwhelm the knowledge of specialized smaller models. We thus propose TriMix, a test-time logit fusion framework that dynamically balances capabilities from three different sources: LRL competence from a continually pretrained small model, task competence from high-resource language instruction tuning, and the scaling benefits of large models. It is data- and compute-efficient, requiring no LRL task annotations, and only continual pretraining on a small model. Experiments across four model families and eight LRLs show that TriMix consistently outperforms single-model baselines and Proxy Tuning. Our analysis reveals that prioritizing the small LRL-specialized model's logits is crucial for success, challenging the prevalent large-model-dominant assumption.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes TriMix, a test-time dynamic logit fusion framework for adapting LLMs to low-resource languages without task annotations or additional training. It fuses logits from three sources—a small model continually pretrained on the target LRL, a high-resource instruction-tuned model, and the large model—claiming consistent outperformance over single-model baselines and Proxy Tuning across four model families and eight LRLs, with analysis indicating that prioritizing the small LRL-specialized model's logits is crucial.

Significance. If the dynamic fusion mechanism reliably up-weights the small model's LRL competence without supervision, the approach would offer a practical, low-resource method for LLM adaptation that avoids the dominance issues of Proxy Tuning. The multi-family, multi-language empirical evaluation provides broad support for the central claim, though the lack of mechanistic transparency limits assessment of generalizability.

major comments (2)
  1. [methods section] The description of the TriMix fusion (methods section): no explicit equation, algorithm, or weighting rule is provided for how the three logit sources are dynamically balanced at test time. This is load-bearing for the central claim, because without a specified mechanism (e.g., entropy-based, magnitude-based, or otherwise) that demonstrably prioritizes the small model on LRL inputs, it remains unclear why large-model logits would not dominate as they do in Proxy Tuning.
  2. [experimental results] Experimental results (across the four model families and eight LRLs): the abstract and results claim consistent outperformance but report no error bars, variance across seeds, or statistical significance tests. This undermines confidence in the empirical support for the claim that TriMix reliably succeeds where baselines fail.
minor comments (1)
  1. [abstract] The abstract would be strengthened by a one-sentence outline of the fusion weighting strategy to allow readers to immediately grasp the technical contribution.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback, which has helped us identify areas for improvement in clarity and empirical rigor. We address each major comment below and have made corresponding revisions to the manuscript.

read point-by-point responses
  1. Referee: [methods section] The description of the TriMix fusion (methods section): no explicit equation, algorithm, or weighting rule is provided for how the three logit sources are dynamically balanced at test time. This is load-bearing for the central claim, because without a specified mechanism (e.g., entropy-based, magnitude-based, or otherwise) that demonstrably prioritizes the small model on LRL inputs, it remains unclear why large-model logits would not dominate as they do in Proxy Tuning.

    Authors: We agree that the methods section would benefit from greater formalization. The original manuscript describes the dynamic balancing of the three logit sources in prose and demonstrates through analysis that the small LRL-specialized model is effectively prioritized. However, we acknowledge the absence of an explicit equation or algorithm for the weighting rule. In the revised manuscript, we have added a formal equation defining the fused logits as a weighted sum and included pseudocode for the test-time fusion procedure. This addition makes the dynamic mechanism explicit and shows how it mitigates large-model dominance, consistent with the empirical results and analysis already present in the paper. revision: yes

  2. Referee: [experimental results] Experimental results (across the four model families and eight LRLs): the abstract and results claim consistent outperformance but report no error bars, variance across seeds, or statistical significance tests. This undermines confidence in the empirical support for the claim that TriMix reliably succeeds where baselines fail.

    Authors: We appreciate this point on statistical reporting. The original experiments were conducted with fixed seeds to ensure reproducibility across the eight languages and four model families, but we agree that variance estimates and significance testing would increase confidence in the reliability of the gains. In the revised manuscript, we have added error bars (standard deviations computed over three independent runs with different seeds) to all main result tables and figures. We have also included paired statistical significance tests (t-tests) comparing TriMix against the strongest baseline in each setting, confirming that the reported improvements are statistically significant (p < 0.05) in the majority of cases. revision: yes

Circularity Check

0 steps flagged

No derivation chain; purely empirical comparisons

full rationale

The paper introduces TriMix as a test-time logit fusion method and supports its claims solely through experimental results across model families and LRLs, outperforming baselines like Proxy Tuning. No equations, first-principles derivations, or parameter-fitting steps are presented that could reduce to inputs by construction. The central claims rest on observed performance gains and post-hoc analysis of logit prioritization, without self-definitional reductions, fitted inputs renamed as predictions, or load-bearing self-citations that substitute for independent evidence. This is a standard empirical NLP adaptation paper with no circularity in any claimed derivation.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The approach rests on domain assumptions about model competences rather than new axioms or entities; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (2)
  • domain assumption Continual pretraining on a small model imparts usable LRL competence that can be fused with other sources
    Central to the method's ability to provide LRL knowledge without task data.
  • domain assumption Dynamic logit fusion can balance the three sources effectively at test time
    Assumed to enable outperformance without additional training or annotations.

pith-pipeline@v0.9.0 · 5467 in / 1242 out tokens · 34009 ms · 2026-05-10T05:24:15.021766+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

79 extracted references · 55 canonical work pages · 9 internal anchors

  1. [1]

    Qwen2.5-Coder Technical Report

    Qwen2. 5-coder technical report , author=. arXiv preprint arXiv:2409.12186 , year=

  2. [2]

    Qwen2.5 Technical Report

    Qwen2. 5 Technical Report , author=. arXiv preprint arXiv:2412.15115 , year=

  3. [3]

    Llama 2: Open Foundation and Fine-Tuned Chat Models

    Llama 2: Open foundation and fine-tuned chat models , author=. arXiv preprint arXiv:2307.09288 , year=

  4. [4]

    Gemma 3 Technical Report

    Gemma 3 technical report , author=. arXiv preprint arXiv:2503.19786 , year=

  5. [5]

    First Conference on Language Modeling , year=

    Tuning Language Models by Proxy , author=. First Conference on Language Modeling , year=

  6. [6]

    Scaling Laws for Neural Language Models

    Scaling laws for neural language models , author=. arXiv preprint arXiv:2001.08361 , year=

  7. [7]

    Costas Mavromatis and Petros Karypis and George Karypis , booktitle=. Pack of. 2024 , url=

  8. [8]

    Sheng Cao and Mingrui Wu and Karthik Prasad and Yuandong Tian and Zechun Liu , booktitle=. Param\. 2025 , url=

  9. [9]

    2023 , url=

    Prateek Yadav and Derek Tam and Leshem Choshen and Colin Raffel and Mohit Bansal , booktitle=. 2023 , url=

  10. [10]

    Nature Machine Intelligence , volume=

    Evolutionary optimization of model merging recipes , author=. Nature Machine Intelligence , volume=. 2025 , publisher=

  11. [11]

    2021 , eprint=

    Evaluating Large Language Models Trained on Code , author=. 2021 , eprint=

  12. [12]

    BLOOM +1: Adding Language Support to BLOOM for Zero-Shot Prompting

    Yong, Zheng Xin and Schoelkopf, Hailey and Muennighoff, Niklas and Aji, Alham Fikri and Adelani, David Ifeoluwa and Almubarak, Khalid and Bari, M Saiful and Sutawika, Lintang and Kasai, Jungo and Baruwa, Ahmed and Winata, Genta and Biderman, Stella and Raff, Edward and Radev, Dragomir and Nikoulina, Vassilina. BLOOM +1: Adding Language Support to BLOOM fo...

  13. [13]

    Continual Pre-Training for Cross-Lingual

    Kazuki Fujii and Taishi Nakamura and Mengsay Loem and Hiroki Iida and Masanari Ohi and Kakeru Hattori and Hirai Shota and Sakae Mizuki and Rio Yokota and Naoaki Okazaki , booktitle=. Continual Pre-Training for Cross-Lingual. 2024 , url=

  14. [14]

    2023 , eprint=

    Dictionary-based Phrase-level Prompting of Large Language Models for Machine Translation , author=. 2023 , eprint=

  15. [15]

    The Twelfth International Conference on Learning Representations , year=

    A Benchmark for Learning to Translate a New Language from One Grammar Book , author=. The Twelfth International Conference on Learning Representations , year=

  16. [16]

    Seth Aycock and David Stap and Di Wu and Christof Monz and Khalil Sima'an , booktitle=. Can. 2025 , url=

  17. [17]

    arXiv preprint arXiv:2511.06531 , year=

    Ibom NLP: A Step Toward Inclusive Natural Language Processing for Nigeria's Minority Languages , author=. arXiv preprint arXiv:2511.06531 , year=

  18. [18]

    The Thirteenth International Conference on Learning Representations , year=

    Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models , author=. The Thirteenth International Conference on Learning Representations , year=

  19. [19]

    MindMerger: Efficiently Boosting

    Zixian Huang and Wenhao Zhu and Gong Cheng and Lei Li and Fei Yuan , booktitle=. MindMerger: Efficiently Boosting. 2024 , url=

  20. [20]

    Transactions on Machine Learning Research , issn=

    Adapting Chat Language Models Using Only Target Unlabeled Language Data , author=. Transactions on Machine Learning Research , issn=. 2025 , url=

  21. [21]

    Logit Arithmetic Elicits Long Reasoning Capabilities Without Training

    Logit arithmetic elicits long reasoning capabilities without training , author=. arXiv preprint arXiv:2510.09354 , year=

  22. [22]

    The Twelfth International Conference on Learning Representations , year=

    An Emulator for Fine-tuning Large Language Models using Small Language Models , author=. The Twelfth International Conference on Learning Representations , year=

  23. [23]

    The Thirty-eighth Annual Conference on Neural Information Processing Systems , year=

    On Giant's Shoulders: Effortless Weak to Strong by Dynamic Logits Fusion , author=. The Thirty-eighth Annual Conference on Neural Information Processing Systems , year=

  24. [24]

    Contrastive decoding improves reasoning in large language models,

    Contrastive decoding improves reasoning in large language models , author=. arXiv preprint arXiv:2309.09117 , year=

  25. [25]

    The Twelfth International Conference on Learning Representations , year=

    DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models , author=. The Twelfth International Conference on Learning Representations , year=

  26. [26]

    Angelika Romanou and Negar Foroutan and Anna Sotnikova and Sree Harsha Nelaturu and Shivalika Singh and Rishabh Maheshwary and Micol Altomare and Zeming Chen and Mohamed A. Haggag and Snegha A and Alfonso Amayuelas and Azril Hafizi Amirudin and Danylo Boiko and Michael Chang and Jenny Chim and Gal Cohen and Aditya Kumar Dalmia and Abraham Diress and Shara...

  27. [27]

    Harvard Data Science Review , volume=

    Lessons from the margins: Contextualizing, reimagining, and hacking generative AI in the Global South , author=. Harvard Data Science Review , volume=. 2025 , publisher=

  28. [28]

    Code Llama: Open Foundation Models for Code

    Code llama: Open foundation models for code , author=. arXiv preprint arXiv:2308.12950 , year=

  29. [29]

    Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism

    Megatron-lm: Training multi-billion parameter language models using model parallelism , author=. arXiv preprint arXiv:1909.08053 , year=

  30. [30]

    Advances in neural information processing systems , volume=

    Neural tangent kernel: Convergence and generalization in neural networks , author=. Advances in neural information processing systems , volume=

  31. [31]

    Advances in Neural Information Processing Systems , volume=

    On reinforcement learning and distribution matching for fine-tuning language models with no catastrophic forgetting , author=. Advances in Neural Information Processing Systems , volume=

  32. [32]

    ICML 2024 Next Generation of AI Safety Workshop , year=

    Weak-to-Strong Jailbreaking on Large Language Models , author=. ICML 2024 Next Generation of AI Safety Workshop , year=

  33. [33]

    The Llama 3 Herd of Models

    The llama 3 herd of models , author=. arXiv preprint arXiv:2407.21783 , year=

  34. [34]

    Understanding In-Context Machine Translation for Low-Resource Languages: A Case Study on M anchu

    Pei, Renhao and Liu, Yihong and Lin, Peiqin and Yvon, Fran c ois and Schuetze, Hinrich. Understanding In-Context Machine Translation for Low-Resource Languages: A Case Study on M anchu. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025. doi:10.18653/v1/2025.acl-long.429

  35. [35]

    MC ^2 : Towards Transparent and Culturally-Aware NLP for Minority Languages in C hina

    Zhang, Chen and Tao, Mingxu and Huang, Quzhe and Lin, Jiuheng and Chen, Zhibin and Feng, Yansong. MC ^2 : Towards Transparent and Culturally-Aware NLP for Minority Languages in C hina. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024. doi:10.18653/v1/2024.acl-long.479

  36. [36]

    Unlocking the Potential of Model Merging for Low-Resource Languages

    Tao, Mingxu and Zhang, Chen and Huang, Quzhe and Ma, Tianyao and Huang, Songfang and Zhao, Dongyan and Feng, Yansong. Unlocking the Potential of Model Merging for Low-Resource Languages. Findings of the Association for Computational Linguistics: EMNLP 2024. 2024. doi:10.18653/v1/2024.findings-emnlp.508

  37. [37]

    M i L i C -Eval: Benchmarking Multilingual LLM s for C hina ' s Minority Languages

    Zhang, Chen and Tao, Mingxu and Liao, Zhiyuan and Feng, Yansong. M i L i C -Eval: Benchmarking Multilingual LLM s for C hina ' s Minority Languages. Findings of the Association for Computational Linguistics: ACL 2025. 2025. doi:10.18653/v1/2025.findings-acl.578

  38. [38]

    doi: 10.18653/v1/2024.acl-long.44

    Bandarkar, Lucas and Liang, Davis and Muller, Benjamin and Artetxe, Mikel and Shukla, Satya Narayan and Husa, Donald and Goyal, Naman and Krishnan, Abhinandan and Zettlemoyer, Luke and Khabsa, Madian. The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants. Proceedings of the 62nd Annual Meeting of the Association for Com...

  39. [39]

    and Mao, Yanke and Gao, Haonan and Lee, En-Shiun Annie

    Adelani, David Ifeoluwa and Liu, Hannah and Shen, Xiaoyu and Vassilyev, Nikita and Alabi, Jesujoba O. and Mao, Yanke and Gao, Haonan and Lee, En-Shiun Annie. SIB -200: A Simple, Inclusive, and Big Evaluation Dataset for Topic Classification in 200+ Languages and Dialects. Proceedings of the 18th Conference of the European Chapter of the Association for Co...

  40. [40]

    Adaptive Contrastive Search: Uncertainty-Guided Decoding for Open-Ended Text Generation

    Garces Arias, Esteban and Rodemann, Julian and Li, Meimingwei and Heumann, Christian and A enmacher, Matthias. Adaptive Contrastive Search: Uncertainty-Guided Decoding for Open-Ended Text Generation. Findings of the Association for Computational Linguistics: EMNLP 2024. 2024. doi:10.18653/v1/2024.findings-emnlp.885

  41. [41]

    Contrastive Decoding: Open-ended Text Generation as Optimization , booktitle =

    Li, Xiang Lisa and Holtzman, Ari and Fried, Daniel and Liang, Percy and Eisner, Jason and Hashimoto, Tatsunori and Zettlemoyer, Luke and Lewis, Mike. Contrastive Decoding: Open-ended Text Generation as Optimization. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2023. doi:10.18653/v1/2023.a...

  42. [42]

    Shih-Cheng Huang, Pin-Zu Li, Yu-chi Hsu, Kuang-Ming Chen, Yu Tung Lin, Shih-Kai Hsiao, Richard Tsai, and Hung-yi Lee

    Huang, Shih-Cheng and Li, Pin-Zu and Hsu, Yu-chi and Chen, Kuang-Ming and Lin, Yu Tung and Hsiao, Shih-Kai and Tsai, Richard and Lee, Hung-yi. Chat Vector: A Simple Approach to Equip LLM s with Instruction Following and Model Alignment in New Languages. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long...

  43. [43]

    Aya Dataset: An Open-Access Collection for Multilingual Instruction Tuning

    Singh, Shivalika and Vargus, Freddie and D. Aya Dataset: An Open-Access Collection for Multilingual Instruction Tuning. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024. doi:10.18653/v1/2024.acl-long.620

  44. [44]

    Unlocking Parameter-Efficient Fine-Tuning for Low-Resource Language Translation

    Su, Tong and Peng, Xin and Thillainathan, Sarubi and Guzm \'a n, David and Ranathunga, Surangika and Lee, En-Shiun. Unlocking Parameter-Efficient Fine-Tuning for Low-Resource Language Translation. Findings of the Association for Computational Linguistics: NAACL 2024. 2024. doi:10.18653/v1/2024.findings-naacl.263

  45. [45]

    Multilingual Instruction Tuning With Just a Pinch of Multilinguality

    Shaham, Uri and Herzig, Jonathan and Aharoni, Roee and Szpektor, Idan and Tsarfaty, Reut and Eyal, Matan. Multilingual Instruction Tuning With Just a Pinch of Multilinguality. Findings of the Association for Computational Linguistics: ACL 2024. 2024. doi:10.18653/v1/2024.findings-acl.136

  46. [46]

    Group then Scale: Dynamic Mixture-of-Experts Multilingual Language Model

    Li, Chong and Deng, Yingzhuo and Zhang, Jiajun and Zong, Chengqing. Group then Scale: Dynamic Mixture-of-Experts Multilingual Language Model. Findings of the Association for Computational Linguistics: ACL 2025. 2025. doi:10.18653/v1/2025.findings-acl.87

  47. [47]

    Dictionary-Aided Translation for Handling Multi-Word Expressions in Low-Resource Languages

    Dimakis, Antonios and Markantonatou, Stella and Anastasopoulos, Antonios. Dictionary-Aided Translation for Handling Multi-Word Expressions in Low-Resource Languages. Findings of the Association for Computational Linguistics: ACL 2024. 2024. doi:10.18653/v1/2024.findings-acl.152

  48. [48]

    Teaching Large Language Models an Unseen Language on the Fly

    Zhang, Chen and Liu, Xiao and Lin, Jiuheng and Feng, Yansong. Teaching Large Language Models an Unseen Language on the Fly. Findings of the Association for Computational Linguistics: ACL 2024. 2024. doi:10.18653/v1/2024.findings-acl.519

  49. [49]

    Read it in Two Steps: Translating Extremely Low-Resource Languages with Code-Augmented Grammar Books

    Zhang, Chen and Lin, Jiuheng and Liu, Xiao and Zhang, Zekai and Feng, Yansong. Read it in Two Steps: Translating Extremely Low-Resource Languages with Code-Augmented Grammar Books. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025. doi:10.18653/v1/2025.acl-long.202

  50. [50]

    Shortcomings of LLM s for Low-Resource Translation: Retrieval and Understanding Are Both the Problem

    Court, Sara and Elsner, Micha. Shortcomings of LLM s for Low-Resource Translation: Retrieval and Understanding Are Both the Problem. Proceedings of the Ninth Conference on Machine Translation. 2024. doi:10.18653/v1/2024.wmt-1.125

  51. [51]

    It ' s All About In-Context Learning! Teaching Extremely Low-Resource Languages to LLM s

    Li, Yue and Zhao, Zhixue and Scarton, Carolina. It ' s All About In-Context Learning! Teaching Extremely Low-Resource Languages to LLM s. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025. doi:10.18653/v1/2025.emnlp-main.1502

  52. [52]

    Self-Distillation for Model Stacking Unlocks Cross-Lingual NLU in 200+ Languages

    Schmidt, Fabian David and Borchert, Philipp and Vuli \'c , Ivan and Glava s , Goran. Self-Distillation for Model Stacking Unlocks Cross-Lingual NLU in 200+ Languages. Findings of the Association for Computational Linguistics: EMNLP 2024. 2024. doi:10.18653/v1/2024.findings-emnlp.394

  53. [53]

    Multilingual Encoder Knows more than You Realize: Shared Weights Pretraining for Extremely Low-Resource Languages

    Su, Zeli and Zhang, Ziyin and Xu, Guixian and Liu, Jianing and Han, Xu and Zhang, Ting and Dong, Yushuang. Multilingual Encoder Knows more than You Realize: Shared Weights Pretraining for Extremely Low-Resource Languages. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025. doi:10.18653/v1/...

  54. [54]

    Improving Cross-Domain Low-Resource Text Generation through LLM Post-Editing: A Programmer-Interpreter Approach

    Li, Zhuang and Haroutunian, Levon and Tumuluri, Raj and Cohen, Philip and Haf, Reza. Improving Cross-Domain Low-Resource Text Generation through LLM Post-Editing: A Programmer-Interpreter Approach. Findings of the Association for Computational Linguistics: EACL 2024. 2024

  55. [55]

    SCALE : Synergized Collaboration of Asymmetric Language Translation Engines

    Cheng, Xin and Wang, Xun and Ge, Tao and Chen, Si-Qing and Wei, Furu and Zhao, Dongyan and Yan, Rui. SCALE : Synergized Collaboration of Asymmetric Language Translation Engines. Findings of the Association for Computational Linguistics: ACL 2024. 2024. doi:10.18653/v1/2024.findings-acl.941

  56. [56]

    Together We Can: Multilingual Automatic Post-Editing for Low-Resource Languages

    Deoghare, Sourabh and Kanojia, Diptesh and Bhattacharyya, Pushpak. Together We Can: Multilingual Automatic Post-Editing for Low-Resource Languages. Findings of the Association for Computational Linguistics: EMNLP 2024. 2024. doi:10.18653/v1/2024.findings-emnlp.634

  57. [57]

    Improving Cross Lingual Transfer by Pretraining with Active Forgetting

    Aggarwal, Divyanshu and Sathe, Ashutosh and Sitaram, Sunayana. Improving Cross Lingual Transfer by Pretraining with Active Forgetting. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025. doi:10.18653/v1/2025.emnlp-main.120

  58. [58]

    Trusting Your Evidence: Hallucinate Less with Context-aware Decoding

    Shi, Weijia and Han, Xiaochuang and Lewis, Mike and Tsvetkov, Yulia and Zettlemoyer, Luke and Yih, Wen-tau. Trusting Your Evidence: Hallucinate Less with Context-aware Decoding. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers). 2024. doi:...

  59. [59]

    Adaptive Contrastive Decoding in Retrieval-Augmented Generation for Handling Noisy Contexts

    Kim, Youna and Kim, Hyuhng Joon and Park, Cheonbok and Park, Choonghyun and Cho, Hyunsoo and Kim, Junyeob and Yoo, Kang Min and Lee, Sang-goo and Kim, Taeuk. Adaptive Contrastive Decoding in Retrieval-Augmented Generation for Handling Noisy Contexts. Findings of the Association for Computational Linguistics: EMNLP 2024. 2024. doi:10.18653/v1/2024.findings...

  60. [60]

    Learning to Decode Collaboratively with Multiple Language Models

    Shen, Zejiang and Lang, Hunter and Wang, Bailin and Kim, Yoon and Sontag, David. Learning to Decode Collaboratively with Multiple Language Models. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024. doi:10.18653/v1/2024.acl-long.701

  61. [61]

    DVD : Dynamic Contrastive Decoding for Knowledge Amplification in Multi-Document Question Answering

    Jin, Jing and Wang, Houfeng and Zhang, Hao and Li, Xiaoguang and Guo, Zhijiang. DVD : Dynamic Contrastive Decoding for Knowledge Amplification in Multi-Document Question Answering. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024. doi:10.18653/v1/2024.emnlp-main.266

  62. [62]

    E pi C o D e: Boosting Model Performance Beyond Training with Extrapolation and Contrastive Decoding

    Tao, Mingxu and Hu, Jie and Yang, Mingchuan and Liu, Yunhuai and Zhao, Dongyan and Feng, Yansong. E pi C o D e: Boosting Model Performance Beyond Training with Extrapolation and Contrastive Decoding. Findings of the Association for Computational Linguistics: ACL 2025. 2025. doi:10.18653/v1/2025.findings-acl.769

  63. [63]

    Multilingual Contrastive Decoding via Language-Agnostic Layers Skipping

    Zhu, Wenhao and Liu, Sizhe and Huang, Shujian and She, Shuaijie and Wendler, Chris and Chen, Jiajun. Multilingual Contrastive Decoding via Language-Agnostic Layers Skipping. Findings of the Association for Computational Linguistics: EMNLP 2024. 2024. doi:10.18653/v1/2024.findings-emnlp.512

  64. [64]

    Mitigating Hallucinations and Off-target Machine Translation with Source-Contrastive and Language-Contrastive Decoding

    Sennrich, Rico and Vamvas, Jannis and Mohammadshahi, Alireza. Mitigating Hallucinations and Off-target Machine Translation with Source-Contrastive and Language-Contrastive Decoding. Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 2: Short Papers). 2024. doi:10.18653/v1/2024.eacl-short.4

  65. [65]

    Contrastive Decoding Reduces Hallucinations in Large Multilingual Machine Translation Models

    Waldendorf, Jonas and Haddow, Barry and Birch, Alexandra. Contrastive Decoding Reduces Hallucinations in Large Multilingual Machine Translation Models. Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers). 2024. doi:10.18653/v1/2024.eacl-long.155

  66. [66]

    Uncertainty-Aware Contrastive Decoding

    Lee, Hakyung and Park, Subeen and Kim, Joowang and Lim, Sungjun and Song, Kyungwoo. Uncertainty-Aware Contrastive Decoding. Findings of the Association for Computational Linguistics: ACL 2025. 2025. doi:10.18653/v1/2025.findings-acl.1352

  67. [67]

    Unlocking Decoding-time Controllability: Gradient-Free Multi-Objective Alignment with Contrastive Prompts

    Fu, Tingchen and Hou, Yupeng and McAuley, Julian and Yan, Rui. Unlocking Decoding-time Controllability: Gradient-Free Multi-Objective Alignment with Contrastive Prompts. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 2025. doi:...

  68. [68]

    ROSE Doesn ' t Do That: Boosting the Safety of Instruction-Tuned Large Language Models with Reverse Prompt Contrastive Decoding

    Zhong, Qihuang and Ding, Liang and Liu, Juhua and Du, Bo and Tao, Dacheng. ROSE Doesn ' t Do That: Boosting the Safety of Instruction-Tuned Large Language Models with Reverse Prompt Contrastive Decoding. Findings of the Association for Computational Linguistics: ACL 2024. 2024. doi:10.18653/v1/2024.findings-acl.814

  69. [69]

    P rune CD : Contrasting Pruned Self Model to Improve Decoding Factuality

    Yu, Byeongho and Lee, Changhun and Jin, Jun-gyu and Park, Eunhyeok. P rune CD : Contrasting Pruned Self Model to Improve Decoding Factuality. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025. doi:10.18653/v1/2025.emnlp-main.1651

  70. [70]

    URL https: //aclanthology.org/2025.acl-long.919/

    Singh, Shivalika and Romanou, Angelika and Fourrier, Cl \'e mentine and Adelani, David Ifeoluwa and Ngui, Jian Gang and Vila-Suero, Daniel and Limkonchotiwat, Peerat and Marchisio, Kelly and Leong, Wei Qi and Susanto, Yosephine and Ng, Raymond and Longpre, Shayne and Ruder, Sebastian and Ko, Wei-Yin and Bosselut, Antoine and Oh, Alice and Martins, Andre a...

  71. [71]

    Adaptation of Large Language Models

    Ke, Zixuan and Ming, Yifei and Joty, Shafiq. Adaptation of Large Language Models. Proceedings of the 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 5: Tutorial Abstracts). 2025. doi:10.18653/v1/2025.naacl-tutorial.5

  72. [72]

    Joshi, S

    Joshi, Pratik and Santy, Sebastin and Budhiraja, Amar and Bali, Kalika and Choudhury, Monojit. The State and Fate of Linguistic Diversity and Inclusion in the NLP World. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020. doi:10.18653/v1/2020.acl-main.560

  73. [73]

    Sub-1 B Language Models for Low-Resource Languages: Training Strategies and Insights for B asque

    Urbizu, Gorka and Corral, Ander and Saralegi, Xabier and San Vicente, I \ n aki. Sub-1 B Language Models for Low-Resource Languages: Training Strategies and Insights for B asque. Proceedings of the 5th Workshop on Multilingual Representation Learning (MRL 2025). 2025. doi:10.18653/v1/2025.mrl-main.35

  74. [74]

    and Lange, Lukas and Adel, Heike and Str

    Hedderich, Michael A. and Lange, Lukas and Adel, Heike and Str. A Survey on Recent Approaches for Natural Language Processing in Low-Resource Scenarios. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2021. doi:10.18653/v1/2021.naacl-main.201

  75. [75]

    Hit the Sweet Spot! Span-Level Ensemble for Large Language Models

    Xu, Yangyifan and Chen, Jianghao and Wu, Junhong and Zhang, Jiajun. Hit the Sweet Spot! Span-Level Ensemble for Large Language Models. Proceedings of the 31st International Conference on Computational Linguistics. 2025

  76. [76]

    Documenting Large Webtext Corpora:

    Dodge, Jesse and Sap, Maarten and Marasovi \'c , Ana and Agnew, William and Ilharco, Gabriel and Groeneveld, Dirk and Mitchell, Margaret and Gardner, Matt. Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 2021. doi:10.18653/v1/2021....

  77. [77]

    doi: 10.18653/v1/2024.emnlp-industry.36

    Goddard, Charles and Siriwardhana, Shamane and Ehghaghi, Malikeh and Meyers, Luke and Karpukhin, Vladimir and Benedict, Brian and McQuade, Mark and Solawetz, Jacob. Arcee ' s M erge K it: A Toolkit for Merging Large Language Models. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track. 2024. doi:10.18653/v...

  78. [78]

    Proxy Tuning for Financial Sentiment Analysis: Overcoming Data Scarcity and Computational Barriers

    Wang, Yuxiang and Wang, Yuchi and Liu, Yi and Bao, Ruihan and Harimoto, Keiko and Sun, Xu. Proxy Tuning for Financial Sentiment Analysis: Overcoming Data Scarcity and Computational Barriers. Proceedings of the Joint Workshop of the 9th Financial Technology and Natural Language Processing (FinNLP), the 6th Financial Narrative Processing (FNP), and the 1st ...

  79. [79]

    Back to School: Translation Using Grammar Books

    Hus, Jonathan and Anastasopoulos, Antonios. Back to School: Translation Using Grammar Books. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024. doi:10.18653/v1/2024.emnlp-main.1127