arxiv: 2604.18106 · v1 · submitted 2026-04-20 · 💻 cs.CL

Recognition: unknown

Efficient Low-Resource Language Adaptation via Multi-Source Dynamic Logit Fusion

Chen Zhang , Jiuheng Lin , Zhiyuan Liao , Yansong Feng

Authors on Pith no claims yet

Pith reviewed 2026-05-10 05:24 UTC · model grok-4.3

classification 💻 cs.CL

keywords low-resource languageslogit fusionlanguage model adaptationtest-time adaptationmulti-source fusioncontinual pretrainingproxy tuning

0 comments

The pith

TriMix dynamically fuses logits from three sources to adapt large language models to low-resource languages without task annotations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes TriMix as a test-time logit fusion method that balances three inputs: LRL competence from a continually pretrained small model, task knowledge from high-resource instruction tuning, and scaling effects from large models. This addresses the failure of Proxy Tuning in LRL settings, where large models' weak performance can dominate. The approach needs only small-model continual pretraining and no LRL task data, and experiments across four model families and eight languages show consistent gains over baselines. Analysis highlights that giving priority to the small LRL-specialized model's logits drives the improvement.

Core claim

TriMix is a test-time logit fusion framework that dynamically balances LRL competence from a continually pretrained small model, task competence from high-resource language instruction tuning, and the scaling benefits of large models; it requires no LRL task annotations and outperforms single-model baselines and Proxy Tuning across four model families and eight LRLs, with prioritizing the small LRL-specialized model's logits proving crucial.

What carries the argument

TriMix, a test-time logit fusion framework that dynamically weights and combines logits from a small continually pretrained LRL model, a high-resource tuned model, and a large model.

If this is right

TriMix enables efficient LRL adaptation using only continual pretraining on a small model and existing high-resource tuning data.
The method applies across multiple model families without requiring LRL-specific task annotations.
Prioritizing logits from the small LRL-specialized model is necessary for the fusion to succeed, contrary to large-model-dominant assumptions.
The framework remains data-efficient and compute-light at inference time.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Fusion weights could be learned or adapted per language or task without manual tuning, extending the dynamic aspect further.
Similar three-source balancing might transfer to other adaptation settings like domain shift or specialized skills where small experts complement large models.
Testing TriMix on even scarcer languages or with zero high-resource overlap would clarify the limits of the small-model prioritization.

Load-bearing premise

That the small model's LRL competence from continual pretraining can be dynamically balanced with the other sources without the large model's weak LRL performance overwhelming the fusion, and that this holds without any LRL task annotations.

What would settle it

An experiment on new LRLs where even optimized dynamic weighting in TriMix produces no gains over Proxy Tuning or single small-model baselines, or where forcing higher weight on the small model hurts rather than helps results.

Figures

Figures reproduced from arXiv: 2604.18106 by Chen Zhang, Jiuheng Lin, Yansong Feng, Zhiyuan Liao.

**Figure 1.** Figure 1: TRIMIX integrates three sources of benefit for LRL adaptation while minimizing the need for annotating task data and tuning larger models. et al., 2025; Hernández et al., 2025). Researchers working on LRLs often rely on smaller models and lack the resources required to continually pretrain large-scale models, even though larger models typically exhibit stronger general abilities. Recent studies have expl… view at source ↗

**Figure 2.** Figure 2: The framework of TRIMIX. Given a task prompt for an LRL, TRIMIX dynamically fuses the logits of three models to integrate language competence, task competence, and scaling benefits. HRL instruction-following data. • Continually Pretrained (cpt): A variant obtained by performing CPT on the base model using LRL corpora to enhance target language competence. To minimize computational overhead, we perform CP… view at source ↗

**Figure 3.** Figure 3: Performance of TRIMIX (PPL) across different combinations of model sizes. Scores are reported on MiLiC-Eval and averaged over four languages. Circle sizes represent the scale of training parameters. In practice, LRL researchers can start with the largest CPT model they can feasibly train and apply TRIMIX in combination with a larger instructiontuned model to achieve stable performance gains. In Appendix … view at source ↗

**Figure 4.** Figure 4: The average score across all the tasks on MiLiC-Eval for each pair of hyperparameters in T [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

read the original abstract

Adapting large language models (LLMs) to low-resource languages (LRLs) is constrained by the scarcity of task data and computational resources. Although Proxy Tuning offers a logit-level strategy for introducing scaling effects, it often fails in LRL settings because the large model's weak LRL competence might overwhelm the knowledge of specialized smaller models. We thus propose TriMix, a test-time logit fusion framework that dynamically balances capabilities from three different sources: LRL competence from a continually pretrained small model, task competence from high-resource language instruction tuning, and the scaling benefits of large models. It is data- and compute-efficient, requiring no LRL task annotations, and only continual pretraining on a small model. Experiments across four model families and eight LRLs show that TriMix consistently outperforms single-model baselines and Proxy Tuning. Our analysis reveals that prioritizing the small LRL-specialized model's logits is crucial for success, challenging the prevalent large-model-dominant assumption.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

TriMix shows empirical gains over Proxy Tuning for low-resource languages by fusing three logit sources at test time, but the dynamic weighting rule stays opaque enough to leave the central mechanism unconvincing without more detail.

read the letter

The main point is that TriMix mixes logits from a continually pretrained small model on the target low-resource language, a high-resource instruction-tuned model, and the large base model, all at test time and without any low-resource task labels. The experiments claim consistent wins across four model families and eight languages, plus an analysis that the small model's logits need priority to avoid the large model's weak language performance taking over. That setup directly targets the failure mode the authors identify in Proxy Tuning, and the breadth of the test suite is a plus for a methods paper in this area. The results look like they could be useful for anyone trying to adapt LLMs cheaply to new languages. The soft spot is the fusion mechanics themselves. The abstract gives no equations or explicit rule for how the weights are computed on the fly, so it is unclear whether the balancing actually counters large-model dominance on LRL inputs or whether it depends on a particular schedule that happened to work here. If the weights come from generic signals like entropy or magnitude, the stress-test concern holds: the large model could still overwhelm the mix in other settings, making the reported prioritization finding more of an artifact than a general principle. The paper does not appear to include error bars or significance tests in the summary, which weakens the strength of the outperformance claim. This work is aimed at researchers doing efficient multilingual adaptation or logit-level interventions. It is coherent on its own terms and engages the prior Proxy Tuning literature directly, so it deserves a serious referee even if the method section needs tightening on the weighting details. I would send it to peer review.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes TriMix, a test-time dynamic logit fusion framework for adapting LLMs to low-resource languages without task annotations or additional training. It fuses logits from three sources—a small model continually pretrained on the target LRL, a high-resource instruction-tuned model, and the large model—claiming consistent outperformance over single-model baselines and Proxy Tuning across four model families and eight LRLs, with analysis indicating that prioritizing the small LRL-specialized model's logits is crucial.

Significance. If the dynamic fusion mechanism reliably up-weights the small model's LRL competence without supervision, the approach would offer a practical, low-resource method for LLM adaptation that avoids the dominance issues of Proxy Tuning. The multi-family, multi-language empirical evaluation provides broad support for the central claim, though the lack of mechanistic transparency limits assessment of generalizability.

major comments (2)

[methods section] The description of the TriMix fusion (methods section): no explicit equation, algorithm, or weighting rule is provided for how the three logit sources are dynamically balanced at test time. This is load-bearing for the central claim, because without a specified mechanism (e.g., entropy-based, magnitude-based, or otherwise) that demonstrably prioritizes the small model on LRL inputs, it remains unclear why large-model logits would not dominate as they do in Proxy Tuning.
[experimental results] Experimental results (across the four model families and eight LRLs): the abstract and results claim consistent outperformance but report no error bars, variance across seeds, or statistical significance tests. This undermines confidence in the empirical support for the claim that TriMix reliably succeeds where baselines fail.

minor comments (1)

[abstract] The abstract would be strengthened by a one-sentence outline of the fusion weighting strategy to allow readers to immediately grasp the technical contribution.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback, which has helped us identify areas for improvement in clarity and empirical rigor. We address each major comment below and have made corresponding revisions to the manuscript.

read point-by-point responses

Referee: [methods section] The description of the TriMix fusion (methods section): no explicit equation, algorithm, or weighting rule is provided for how the three logit sources are dynamically balanced at test time. This is load-bearing for the central claim, because without a specified mechanism (e.g., entropy-based, magnitude-based, or otherwise) that demonstrably prioritizes the small model on LRL inputs, it remains unclear why large-model logits would not dominate as they do in Proxy Tuning.

Authors: We agree that the methods section would benefit from greater formalization. The original manuscript describes the dynamic balancing of the three logit sources in prose and demonstrates through analysis that the small LRL-specialized model is effectively prioritized. However, we acknowledge the absence of an explicit equation or algorithm for the weighting rule. In the revised manuscript, we have added a formal equation defining the fused logits as a weighted sum and included pseudocode for the test-time fusion procedure. This addition makes the dynamic mechanism explicit and shows how it mitigates large-model dominance, consistent with the empirical results and analysis already present in the paper. revision: yes
Referee: [experimental results] Experimental results (across the four model families and eight LRLs): the abstract and results claim consistent outperformance but report no error bars, variance across seeds, or statistical significance tests. This undermines confidence in the empirical support for the claim that TriMix reliably succeeds where baselines fail.

Authors: We appreciate this point on statistical reporting. The original experiments were conducted with fixed seeds to ensure reproducibility across the eight languages and four model families, but we agree that variance estimates and significance testing would increase confidence in the reliability of the gains. In the revised manuscript, we have added error bars (standard deviations computed over three independent runs with different seeds) to all main result tables and figures. We have also included paired statistical significance tests (t-tests) comparing TriMix against the strongest baseline in each setting, confirming that the reported improvements are statistically significant (p < 0.05) in the majority of cases. revision: yes

Circularity Check

0 steps flagged

No derivation chain; purely empirical comparisons

full rationale

The paper introduces TriMix as a test-time logit fusion method and supports its claims solely through experimental results across model families and LRLs, outperforming baselines like Proxy Tuning. No equations, first-principles derivations, or parameter-fitting steps are presented that could reduce to inputs by construction. The central claims rest on observed performance gains and post-hoc analysis of logit prioritization, without self-definitional reductions, fitted inputs renamed as predictions, or load-bearing self-citations that substitute for independent evidence. This is a standard empirical NLP adaptation paper with no circularity in any claimed derivation.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The approach rests on domain assumptions about model competences rather than new axioms or entities; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (2)

domain assumption Continual pretraining on a small model imparts usable LRL competence that can be fused with other sources
Central to the method's ability to provide LRL knowledge without task data.
domain assumption Dynamic logit fusion can balance the three sources effectively at test time
Assumed to enable outperformance without additional training or annotations.

pith-pipeline@v0.9.0 · 5467 in / 1242 out tokens · 34009 ms · 2026-05-10T05:24:15.021766+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

79 extracted references · 55 canonical work pages · 9 internal anchors

[1]

Qwen2.5-Coder Technical Report

Qwen2. 5-coder technical report , author=. arXiv preprint arXiv:2409.12186 , year=

work page internal anchor Pith review arXiv
[2]

Qwen2.5 Technical Report

Qwen2. 5 Technical Report , author=. arXiv preprint arXiv:2412.15115 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[3]

Llama 2: Open Foundation and Fine-Tuned Chat Models

Llama 2: Open foundation and fine-tuned chat models , author=. arXiv preprint arXiv:2307.09288 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[4]

Gemma 3 Technical Report

Gemma 3 technical report , author=. arXiv preprint arXiv:2503.19786 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[5]

First Conference on Language Modeling , year=

Tuning Language Models by Proxy , author=. First Conference on Language Modeling , year=
[6]

Scaling Laws for Neural Language Models

Scaling laws for neural language models , author=. arXiv preprint arXiv:2001.08361 , year=

work page internal anchor Pith review Pith/arXiv arXiv 2001
[7]

Costas Mavromatis and Petros Karypis and George Karypis , booktitle=. Pack of. 2024 , url=

2024
[8]

Sheng Cao and Mingrui Wu and Karthik Prasad and Yuandong Tian and Zechun Liu , booktitle=. Param\. 2025 , url=

2025
[9]

2023 , url=

Prateek Yadav and Derek Tam and Leshem Choshen and Colin Raffel and Mohit Bansal , booktitle=. 2023 , url=

2023
[10]

Nature Machine Intelligence , volume=

Evolutionary optimization of model merging recipes , author=. Nature Machine Intelligence , volume=. 2025 , publisher=

2025
[11]

2021 , eprint=

Evaluating Large Language Models Trained on Code , author=. 2021 , eprint=

2021
[12]

BLOOM +1: Adding Language Support to BLOOM for Zero-Shot Prompting

Yong, Zheng Xin and Schoelkopf, Hailey and Muennighoff, Niklas and Aji, Alham Fikri and Adelani, David Ifeoluwa and Almubarak, Khalid and Bari, M Saiful and Sutawika, Lintang and Kasai, Jungo and Baruwa, Ahmed and Winata, Genta and Biderman, Stella and Raff, Edward and Radev, Dragomir and Nikoulina, Vassilina. BLOOM +1: Adding Language Support to BLOOM fo...

work page doi:10.18653/v1/2023.acl-long.653 2023
[13]

Continual Pre-Training for Cross-Lingual

Kazuki Fujii and Taishi Nakamura and Mengsay Loem and Hiroki Iida and Masanari Ohi and Kakeru Hattori and Hirai Shota and Sakae Mizuki and Rio Yokota and Naoaki Okazaki , booktitle=. Continual Pre-Training for Cross-Lingual. 2024 , url=

2024
[14]

2023 , eprint=

Dictionary-based Phrase-level Prompting of Large Language Models for Machine Translation , author=. 2023 , eprint=

2023
[15]

The Twelfth International Conference on Learning Representations , year=

A Benchmark for Learning to Translate a New Language from One Grammar Book , author=. The Twelfth International Conference on Learning Representations , year=
[16]

Seth Aycock and David Stap and Di Wu and Christof Monz and Khalil Sima'an , booktitle=. Can. 2025 , url=

2025
[17]

arXiv preprint arXiv:2511.06531 , year=

Ibom NLP: A Step Toward Inclusive Natural Language Processing for Nigeria's Minority Languages , author=. arXiv preprint arXiv:2511.06531 , year=

work page arXiv
[18]

The Thirteenth International Conference on Learning Representations , year=

Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models , author=. The Thirteenth International Conference on Learning Representations , year=
[19]

MindMerger: Efficiently Boosting

Zixian Huang and Wenhao Zhu and Gong Cheng and Lei Li and Fei Yuan , booktitle=. MindMerger: Efficiently Boosting. 2024 , url=

2024
[20]

Transactions on Machine Learning Research , issn=

Adapting Chat Language Models Using Only Target Unlabeled Language Data , author=. Transactions on Machine Learning Research , issn=. 2025 , url=

2025
[21]

Logit Arithmetic Elicits Long Reasoning Capabilities Without Training

Logit arithmetic elicits long reasoning capabilities without training , author=. arXiv preprint arXiv:2510.09354 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[22]

The Twelfth International Conference on Learning Representations , year=

An Emulator for Fine-tuning Large Language Models using Small Language Models , author=. The Twelfth International Conference on Learning Representations , year=
[23]

The Thirty-eighth Annual Conference on Neural Information Processing Systems , year=

On Giant's Shoulders: Effortless Weak to Strong by Dynamic Logits Fusion , author=. The Thirty-eighth Annual Conference on Neural Information Processing Systems , year=
[24]

Contrastive decoding improves reasoning in large language models,

Contrastive decoding improves reasoning in large language models , author=. arXiv preprint arXiv:2309.09117 , year=

work page arXiv
[25]

The Twelfth International Conference on Learning Representations , year=

DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models , author=. The Twelfth International Conference on Learning Representations , year=
[26]

Angelika Romanou and Negar Foroutan and Anna Sotnikova and Sree Harsha Nelaturu and Shivalika Singh and Rishabh Maheshwary and Micol Altomare and Zeming Chen and Mohamed A. Haggag and Snegha A and Alfonso Amayuelas and Azril Hafizi Amirudin and Danylo Boiko and Michael Chang and Jenny Chim and Gal Cohen and Aditya Kumar Dalmia and Abraham Diress and Shara...
[27]

Harvard Data Science Review , volume=

Lessons from the margins: Contextualizing, reimagining, and hacking generative AI in the Global South , author=. Harvard Data Science Review , volume=. 2025 , publisher=

2025
[28]

Code Llama: Open Foundation Models for Code

Code llama: Open foundation models for code , author=. arXiv preprint arXiv:2308.12950 , year=

work page internal anchor Pith review arXiv
[29]

Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism

Megatron-lm: Training multi-billion parameter language models using model parallelism , author=. arXiv preprint arXiv:1909.08053 , year=

work page internal anchor Pith review arXiv 1909
[30]

Advances in neural information processing systems , volume=

Neural tangent kernel: Convergence and generalization in neural networks , author=. Advances in neural information processing systems , volume=
[31]

Advances in Neural Information Processing Systems , volume=

On reinforcement learning and distribution matching for fine-tuning language models with no catastrophic forgetting , author=. Advances in Neural Information Processing Systems , volume=
[32]

ICML 2024 Next Generation of AI Safety Workshop , year=

Weak-to-Strong Jailbreaking on Large Language Models , author=. ICML 2024 Next Generation of AI Safety Workshop , year=

2024
[33]

The Llama 3 Herd of Models

The llama 3 herd of models , author=. arXiv preprint arXiv:2407.21783 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[34]

Understanding In-Context Machine Translation for Low-Resource Languages: A Case Study on M anchu

Pei, Renhao and Liu, Yihong and Lin, Peiqin and Yvon, Fran c ois and Schuetze, Hinrich. Understanding In-Context Machine Translation for Low-Resource Languages: A Case Study on M anchu. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025. doi:10.18653/v1/2025.acl-long.429

work page doi:10.18653/v1/2025.acl-long.429 2025
[35]

MC ^2 : Towards Transparent and Culturally-Aware NLP for Minority Languages in C hina

Zhang, Chen and Tao, Mingxu and Huang, Quzhe and Lin, Jiuheng and Chen, Zhibin and Feng, Yansong. MC ^2 : Towards Transparent and Culturally-Aware NLP for Minority Languages in C hina. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024. doi:10.18653/v1/2024.acl-long.479

work page doi:10.18653/v1/2024.acl-long.479 2024
[36]

Unlocking the Potential of Model Merging for Low-Resource Languages

Tao, Mingxu and Zhang, Chen and Huang, Quzhe and Ma, Tianyao and Huang, Songfang and Zhao, Dongyan and Feng, Yansong. Unlocking the Potential of Model Merging for Low-Resource Languages. Findings of the Association for Computational Linguistics: EMNLP 2024. 2024. doi:10.18653/v1/2024.findings-emnlp.508

work page doi:10.18653/v1/2024.findings-emnlp.508 2024
[37]

M i L i C -Eval: Benchmarking Multilingual LLM s for C hina ' s Minority Languages

Zhang, Chen and Tao, Mingxu and Liao, Zhiyuan and Feng, Yansong. M i L i C -Eval: Benchmarking Multilingual LLM s for C hina ' s Minority Languages. Findings of the Association for Computational Linguistics: ACL 2025. 2025. doi:10.18653/v1/2025.findings-acl.578

work page doi:10.18653/v1/2025.findings-acl.578 2025
[38]

doi: 10.18653/v1/2024.acl-long.44

Bandarkar, Lucas and Liang, Davis and Muller, Benjamin and Artetxe, Mikel and Shukla, Satya Narayan and Husa, Donald and Goyal, Naman and Krishnan, Abhinandan and Zettlemoyer, Luke and Khabsa, Madian. The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants. Proceedings of the 62nd Annual Meeting of the Association for Com...

work page doi:10.18653/v1/2024.acl-long.44 2024
[39]

and Mao, Yanke and Gao, Haonan and Lee, En-Shiun Annie

Adelani, David Ifeoluwa and Liu, Hannah and Shen, Xiaoyu and Vassilyev, Nikita and Alabi, Jesujoba O. and Mao, Yanke and Gao, Haonan and Lee, En-Shiun Annie. SIB -200: A Simple, Inclusive, and Big Evaluation Dataset for Topic Classification in 200+ Languages and Dialects. Proceedings of the 18th Conference of the European Chapter of the Association for Co...

work page doi:10.18653/v1/2024.eacl-long.14 2024
[40]

Adaptive Contrastive Search: Uncertainty-Guided Decoding for Open-Ended Text Generation

Garces Arias, Esteban and Rodemann, Julian and Li, Meimingwei and Heumann, Christian and A enmacher, Matthias. Adaptive Contrastive Search: Uncertainty-Guided Decoding for Open-Ended Text Generation. Findings of the Association for Computational Linguistics: EMNLP 2024. 2024. doi:10.18653/v1/2024.findings-emnlp.885

work page doi:10.18653/v1/2024.findings-emnlp.885 2024
[41]

Contrastive Decoding: Open-ended Text Generation as Optimization , booktitle =

Li, Xiang Lisa and Holtzman, Ari and Fried, Daniel and Liang, Percy and Eisner, Jason and Hashimoto, Tatsunori and Zettlemoyer, Luke and Lewis, Mike. Contrastive Decoding: Open-ended Text Generation as Optimization. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2023. doi:10.18653/v1/2023.a...

work page doi:10.18653/v1/2023.acl-long.687 2023
[42]

Shih-Cheng Huang, Pin-Zu Li, Yu-chi Hsu, Kuang-Ming Chen, Yu Tung Lin, Shih-Kai Hsiao, Richard Tsai, and Hung-yi Lee

Huang, Shih-Cheng and Li, Pin-Zu and Hsu, Yu-chi and Chen, Kuang-Ming and Lin, Yu Tung and Hsiao, Shih-Kai and Tsai, Richard and Lee, Hung-yi. Chat Vector: A Simple Approach to Equip LLM s with Instruction Following and Model Alignment in New Languages. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long...

work page doi:10.18653/v1/2024.acl-long.590 2024
[43]

Aya Dataset: An Open-Access Collection for Multilingual Instruction Tuning

Singh, Shivalika and Vargus, Freddie and D. Aya Dataset: An Open-Access Collection for Multilingual Instruction Tuning. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024. doi:10.18653/v1/2024.acl-long.620

work page doi:10.18653/v1/2024.acl-long.620 2024
[44]

Unlocking Parameter-Efficient Fine-Tuning for Low-Resource Language Translation

Su, Tong and Peng, Xin and Thillainathan, Sarubi and Guzm \'a n, David and Ranathunga, Surangika and Lee, En-Shiun. Unlocking Parameter-Efficient Fine-Tuning for Low-Resource Language Translation. Findings of the Association for Computational Linguistics: NAACL 2024. 2024. doi:10.18653/v1/2024.findings-naacl.263

work page doi:10.18653/v1/2024.findings-naacl.263 2024
[45]

Multilingual Instruction Tuning With Just a Pinch of Multilinguality

Shaham, Uri and Herzig, Jonathan and Aharoni, Roee and Szpektor, Idan and Tsarfaty, Reut and Eyal, Matan. Multilingual Instruction Tuning With Just a Pinch of Multilinguality. Findings of the Association for Computational Linguistics: ACL 2024. 2024. doi:10.18653/v1/2024.findings-acl.136

work page doi:10.18653/v1/2024.findings-acl.136 2024
[46]

Group then Scale: Dynamic Mixture-of-Experts Multilingual Language Model

Li, Chong and Deng, Yingzhuo and Zhang, Jiajun and Zong, Chengqing. Group then Scale: Dynamic Mixture-of-Experts Multilingual Language Model. Findings of the Association for Computational Linguistics: ACL 2025. 2025. doi:10.18653/v1/2025.findings-acl.87

work page doi:10.18653/v1/2025.findings-acl.87 2025
[47]

Dictionary-Aided Translation for Handling Multi-Word Expressions in Low-Resource Languages

Dimakis, Antonios and Markantonatou, Stella and Anastasopoulos, Antonios. Dictionary-Aided Translation for Handling Multi-Word Expressions in Low-Resource Languages. Findings of the Association for Computational Linguistics: ACL 2024. 2024. doi:10.18653/v1/2024.findings-acl.152

work page doi:10.18653/v1/2024.findings-acl.152 2024
[48]

Teaching Large Language Models an Unseen Language on the Fly

Zhang, Chen and Liu, Xiao and Lin, Jiuheng and Feng, Yansong. Teaching Large Language Models an Unseen Language on the Fly. Findings of the Association for Computational Linguistics: ACL 2024. 2024. doi:10.18653/v1/2024.findings-acl.519

work page doi:10.18653/v1/2024.findings-acl.519 2024
[49]

Read it in Two Steps: Translating Extremely Low-Resource Languages with Code-Augmented Grammar Books

Zhang, Chen and Lin, Jiuheng and Liu, Xiao and Zhang, Zekai and Feng, Yansong. Read it in Two Steps: Translating Extremely Low-Resource Languages with Code-Augmented Grammar Books. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025. doi:10.18653/v1/2025.acl-long.202

work page doi:10.18653/v1/2025.acl-long.202 2025
[50]

Shortcomings of LLM s for Low-Resource Translation: Retrieval and Understanding Are Both the Problem

Court, Sara and Elsner, Micha. Shortcomings of LLM s for Low-Resource Translation: Retrieval and Understanding Are Both the Problem. Proceedings of the Ninth Conference on Machine Translation. 2024. doi:10.18653/v1/2024.wmt-1.125

work page doi:10.18653/v1/2024.wmt-1.125 2024
[51]

It ' s All About In-Context Learning! Teaching Extremely Low-Resource Languages to LLM s

Li, Yue and Zhao, Zhixue and Scarton, Carolina. It ' s All About In-Context Learning! Teaching Extremely Low-Resource Languages to LLM s. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025. doi:10.18653/v1/2025.emnlp-main.1502

work page doi:10.18653/v1/2025.emnlp-main.1502 2025
[52]

Self-Distillation for Model Stacking Unlocks Cross-Lingual NLU in 200+ Languages

Schmidt, Fabian David and Borchert, Philipp and Vuli \'c , Ivan and Glava s , Goran. Self-Distillation for Model Stacking Unlocks Cross-Lingual NLU in 200+ Languages. Findings of the Association for Computational Linguistics: EMNLP 2024. 2024. doi:10.18653/v1/2024.findings-emnlp.394

work page doi:10.18653/v1/2024.findings-emnlp.394 2024
[53]

Multilingual Encoder Knows more than You Realize: Shared Weights Pretraining for Extremely Low-Resource Languages

Su, Zeli and Zhang, Ziyin and Xu, Guixian and Liu, Jianing and Han, Xu and Zhang, Ting and Dong, Yushuang. Multilingual Encoder Knows more than You Realize: Shared Weights Pretraining for Extremely Low-Resource Languages. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025. doi:10.18653/v1/...

work page doi:10.18653/v1/2025.acl-long.893 2025
[54]

Improving Cross-Domain Low-Resource Text Generation through LLM Post-Editing: A Programmer-Interpreter Approach

Li, Zhuang and Haroutunian, Levon and Tumuluri, Raj and Cohen, Philip and Haf, Reza. Improving Cross-Domain Low-Resource Text Generation through LLM Post-Editing: A Programmer-Interpreter Approach. Findings of the Association for Computational Linguistics: EACL 2024. 2024

2024
[55]

SCALE : Synergized Collaboration of Asymmetric Language Translation Engines

Cheng, Xin and Wang, Xun and Ge, Tao and Chen, Si-Qing and Wei, Furu and Zhao, Dongyan and Yan, Rui. SCALE : Synergized Collaboration of Asymmetric Language Translation Engines. Findings of the Association for Computational Linguistics: ACL 2024. 2024. doi:10.18653/v1/2024.findings-acl.941

work page doi:10.18653/v1/2024.findings-acl.941 2024
[56]

Together We Can: Multilingual Automatic Post-Editing for Low-Resource Languages

Deoghare, Sourabh and Kanojia, Diptesh and Bhattacharyya, Pushpak. Together We Can: Multilingual Automatic Post-Editing for Low-Resource Languages. Findings of the Association for Computational Linguistics: EMNLP 2024. 2024. doi:10.18653/v1/2024.findings-emnlp.634

work page doi:10.18653/v1/2024.findings-emnlp.634 2024
[57]

Improving Cross Lingual Transfer by Pretraining with Active Forgetting

Aggarwal, Divyanshu and Sathe, Ashutosh and Sitaram, Sunayana. Improving Cross Lingual Transfer by Pretraining with Active Forgetting. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025. doi:10.18653/v1/2025.emnlp-main.120

work page doi:10.18653/v1/2025.emnlp-main.120 2025
[58]

Trusting Your Evidence: Hallucinate Less with Context-aware Decoding

Shi, Weijia and Han, Xiaochuang and Lewis, Mike and Tsvetkov, Yulia and Zettlemoyer, Luke and Yih, Wen-tau. Trusting Your Evidence: Hallucinate Less with Context-aware Decoding. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers). 2024. doi:...

work page doi:10.18653/v1/2024.naacl-short.69 2024
[59]

Adaptive Contrastive Decoding in Retrieval-Augmented Generation for Handling Noisy Contexts

Kim, Youna and Kim, Hyuhng Joon and Park, Cheonbok and Park, Choonghyun and Cho, Hyunsoo and Kim, Junyeob and Yoo, Kang Min and Lee, Sang-goo and Kim, Taeuk. Adaptive Contrastive Decoding in Retrieval-Augmented Generation for Handling Noisy Contexts. Findings of the Association for Computational Linguistics: EMNLP 2024. 2024. doi:10.18653/v1/2024.findings...

work page doi:10.18653/v1/2024.findings-emnlp.136 2024
[60]

Learning to Decode Collaboratively with Multiple Language Models

Shen, Zejiang and Lang, Hunter and Wang, Bailin and Kim, Yoon and Sontag, David. Learning to Decode Collaboratively with Multiple Language Models. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024. doi:10.18653/v1/2024.acl-long.701

work page doi:10.18653/v1/2024.acl-long.701 2024
[61]

DVD : Dynamic Contrastive Decoding for Knowledge Amplification in Multi-Document Question Answering

Jin, Jing and Wang, Houfeng and Zhang, Hao and Li, Xiaoguang and Guo, Zhijiang. DVD : Dynamic Contrastive Decoding for Knowledge Amplification in Multi-Document Question Answering. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024. doi:10.18653/v1/2024.emnlp-main.266

work page doi:10.18653/v1/2024.emnlp-main.266 2024
[62]

E pi C o D e: Boosting Model Performance Beyond Training with Extrapolation and Contrastive Decoding

Tao, Mingxu and Hu, Jie and Yang, Mingchuan and Liu, Yunhuai and Zhao, Dongyan and Feng, Yansong. E pi C o D e: Boosting Model Performance Beyond Training with Extrapolation and Contrastive Decoding. Findings of the Association for Computational Linguistics: ACL 2025. 2025. doi:10.18653/v1/2025.findings-acl.769

work page doi:10.18653/v1/2025.findings-acl.769 2025
[63]

Multilingual Contrastive Decoding via Language-Agnostic Layers Skipping

Zhu, Wenhao and Liu, Sizhe and Huang, Shujian and She, Shuaijie and Wendler, Chris and Chen, Jiajun. Multilingual Contrastive Decoding via Language-Agnostic Layers Skipping. Findings of the Association for Computational Linguistics: EMNLP 2024. 2024. doi:10.18653/v1/2024.findings-emnlp.512

work page doi:10.18653/v1/2024.findings-emnlp.512 2024
[64]

Mitigating Hallucinations and Off-target Machine Translation with Source-Contrastive and Language-Contrastive Decoding

Sennrich, Rico and Vamvas, Jannis and Mohammadshahi, Alireza. Mitigating Hallucinations and Off-target Machine Translation with Source-Contrastive and Language-Contrastive Decoding. Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 2: Short Papers). 2024. doi:10.18653/v1/2024.eacl-short.4

work page doi:10.18653/v1/2024.eacl-short.4 2024
[65]

Contrastive Decoding Reduces Hallucinations in Large Multilingual Machine Translation Models

Waldendorf, Jonas and Haddow, Barry and Birch, Alexandra. Contrastive Decoding Reduces Hallucinations in Large Multilingual Machine Translation Models. Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers). 2024. doi:10.18653/v1/2024.eacl-long.155

work page doi:10.18653/v1/2024.eacl-long.155 2024
[66]

Uncertainty-Aware Contrastive Decoding

Lee, Hakyung and Park, Subeen and Kim, Joowang and Lim, Sungjun and Song, Kyungwoo. Uncertainty-Aware Contrastive Decoding. Findings of the Association for Computational Linguistics: ACL 2025. 2025. doi:10.18653/v1/2025.findings-acl.1352

work page doi:10.18653/v1/2025.findings-acl.1352 2025
[67]

Unlocking Decoding-time Controllability: Gradient-Free Multi-Objective Alignment with Contrastive Prompts

Fu, Tingchen and Hou, Yupeng and McAuley, Julian and Yan, Rui. Unlocking Decoding-time Controllability: Gradient-Free Multi-Objective Alignment with Contrastive Prompts. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 2025. doi:...

work page doi:10.18653/v1/2025.naacl-long.18 2025
[68]

ROSE Doesn ' t Do That: Boosting the Safety of Instruction-Tuned Large Language Models with Reverse Prompt Contrastive Decoding

Zhong, Qihuang and Ding, Liang and Liu, Juhua and Du, Bo and Tao, Dacheng. ROSE Doesn ' t Do That: Boosting the Safety of Instruction-Tuned Large Language Models with Reverse Prompt Contrastive Decoding. Findings of the Association for Computational Linguistics: ACL 2024. 2024. doi:10.18653/v1/2024.findings-acl.814

work page doi:10.18653/v1/2024.findings-acl.814 2024
[69]

P rune CD : Contrasting Pruned Self Model to Improve Decoding Factuality

Yu, Byeongho and Lee, Changhun and Jin, Jun-gyu and Park, Eunhyeok. P rune CD : Contrasting Pruned Self Model to Improve Decoding Factuality. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025. doi:10.18653/v1/2025.emnlp-main.1651

work page doi:10.18653/v1/2025.emnlp-main.1651 2025
[70]

URL https: //aclanthology.org/2025.acl-long.919/

Singh, Shivalika and Romanou, Angelika and Fourrier, Cl \'e mentine and Adelani, David Ifeoluwa and Ngui, Jian Gang and Vila-Suero, Daniel and Limkonchotiwat, Peerat and Marchisio, Kelly and Leong, Wei Qi and Susanto, Yosephine and Ng, Raymond and Longpre, Shayne and Ruder, Sebastian and Ko, Wei-Yin and Bosselut, Antoine and Oh, Alice and Martins, Andre a...

work page doi:10.18653/v1/2025.acl-long.919 2025
[71]

Adaptation of Large Language Models

Ke, Zixuan and Ming, Yifei and Joty, Shafiq. Adaptation of Large Language Models. Proceedings of the 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 5: Tutorial Abstracts). 2025. doi:10.18653/v1/2025.naacl-tutorial.5

work page doi:10.18653/v1/2025.naacl-tutorial.5 2025
[72]

Joshi, S

Joshi, Pratik and Santy, Sebastin and Budhiraja, Amar and Bali, Kalika and Choudhury, Monojit. The State and Fate of Linguistic Diversity and Inclusion in the NLP World. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020. doi:10.18653/v1/2020.acl-main.560

work page doi:10.18653/v1/2020.acl-main.560 2020
[73]

Sub-1 B Language Models for Low-Resource Languages: Training Strategies and Insights for B asque

Urbizu, Gorka and Corral, Ander and Saralegi, Xabier and San Vicente, I \ n aki. Sub-1 B Language Models for Low-Resource Languages: Training Strategies and Insights for B asque. Proceedings of the 5th Workshop on Multilingual Representation Learning (MRL 2025). 2025. doi:10.18653/v1/2025.mrl-main.35

work page doi:10.18653/v1/2025.mrl-main.35 2025
[74]

and Lange, Lukas and Adel, Heike and Str

Hedderich, Michael A. and Lange, Lukas and Adel, Heike and Str. A Survey on Recent Approaches for Natural Language Processing in Low-Resource Scenarios. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2021. doi:10.18653/v1/2021.naacl-main.201

work page doi:10.18653/v1/2021.naacl-main.201 2021
[75]

Hit the Sweet Spot! Span-Level Ensemble for Large Language Models

Xu, Yangyifan and Chen, Jianghao and Wu, Junhong and Zhang, Jiajun. Hit the Sweet Spot! Span-Level Ensemble for Large Language Models. Proceedings of the 31st International Conference on Computational Linguistics. 2025

2025
[76]

Documenting Large Webtext Corpora:

Dodge, Jesse and Sap, Maarten and Marasovi \'c , Ana and Agnew, William and Ilharco, Gabriel and Groeneveld, Dirk and Mitchell, Margaret and Gardner, Matt. Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 2021. doi:10.18653/v1/2021....

work page doi:10.18653/v1/2021.emnlp-main.98 2021
[77]

doi: 10.18653/v1/2024.emnlp-industry.36

Goddard, Charles and Siriwardhana, Shamane and Ehghaghi, Malikeh and Meyers, Luke and Karpukhin, Vladimir and Benedict, Brian and McQuade, Mark and Solawetz, Jacob. Arcee ' s M erge K it: A Toolkit for Merging Large Language Models. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track. 2024. doi:10.18653/v...

work page doi:10.18653/v1/2024.emnlp-industry.36 2024
[78]

Proxy Tuning for Financial Sentiment Analysis: Overcoming Data Scarcity and Computational Barriers

Wang, Yuxiang and Wang, Yuchi and Liu, Yi and Bao, Ruihan and Harimoto, Keiko and Sun, Xu. Proxy Tuning for Financial Sentiment Analysis: Overcoming Data Scarcity and Computational Barriers. Proceedings of the Joint Workshop of the 9th Financial Technology and Natural Language Processing (FinNLP), the 6th Financial Narrative Processing (FNP), and the 1st ...

2025
[79]

Back to School: Translation Using Grammar Books

Hus, Jonathan and Anastasopoulos, Antonios. Back to School: Translation Using Grammar Books. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024. doi:10.18653/v1/2024.emnlp-main.1127

work page doi:10.18653/v1/2024.emnlp-main.1127 2024