Exploring Cross-lingual Latent Transplantation: Mutual Opportunities and Open Challenges

Bing Qin; Dandan Tu; Duyu Tang; Lei Huang; Libo Qin; Qichen Hong; Weitao Ma; Xiachong Feng; Xiaocheng Feng; Xiaohui Yan

arxiv: 2412.12686 · v3 · submitted 2024-12-17 · 💻 cs.CL

Exploring Cross-lingual Latent Transplantation: Mutual Opportunities and Open Challenges

Yangfan Ye , Xiaocheng Feng , Xiachong Feng , Libo Qin , Yichong Huang , Lei Huang , Weitao Ma , Qichen Hong

show 6 more authors

Zhirui Zhang Yunfei Lu Xiaohui Yan Duyu Tang Dandan Tu Bing Qin

This is my paper

Pith reviewed 2026-05-23 07:02 UTC · model grok-4.3

classification 💻 cs.CL

keywords cross-lingual latent transplantationmultilingual LLMscultural adaptabilitylatent activationslow-resource languagesattention modulesfeed-forward modulesXTransplant

0 comments

The pith

Cross-lingual latent transplantation improves multilingual capability and cultural adaptability in LLMs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces XTransplant, a probing framework that transplants latent activations across languages during inference to let models combine complementary strengths from English and non-English resources. This form of cross-lingual interaction produces mutual gains in multilingual task performance and cultural adaptability, with larger benefits for low-resource languages and cultures. The authors further show that attention modules drive multilingual understanding while feed-forward modules handle culture-specific knowledge. Experiments also indicate that current LLMs leave substantial multilingual potential unused.

Core claim

XTransplant is a probing framework that transplants latent activations across languages to harness complementary strengths of English and non-English resources. Empirical analysis shows this cross-lingual interaction has mutually beneficial effects on multilingual capability and cultural adaptability of LLMs, particularly for low-resource languages and cultures. Attention modules play a pivotal role in multilingual understanding, while feed-forward modules capture culture-specific knowledge. The work exposes considerable underutilization of current LLMs' multilingual potential.

What carries the argument

XTransplant framework that transplants latent activations across languages during inference.

If this is right

XTransplant yields mutual improvements in multilingual capability for both high- and low-resource languages.
XTransplant yields mutual improvements in cultural adaptability, especially for low-resource cultures.
Attention modules support multilingual understanding.
Feed-forward modules are more effective at capturing culture-specific knowledge.
Current LLMs leave substantial internalized multilingual knowledge underutilized.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same activation-transplant approach could be tested on tasks that cross other boundaries such as domain or modality.
Stability results from the paper suggest XTransplant might be combined with existing alignment methods to reduce English-centric bias without full retraining.
The exposed performance gap implies that inference-time interventions may be a cheaper route to multilingual gains than additional pre-training data collection.

Load-bearing premise

Observed performance changes result specifically from transplanting latent activations rather than from other uncontrolled factors in the experimental procedure.

What would settle it

A controlled run in which transplanting the same activations produces no performance change or produces degradation after matching all other experimental variables.

Figures

Figures reproduced from arXiv: 2412.12686 by Bing Qin, Dandan Tu, Duyu Tang, Lei Huang, Libo Qin, Qichen Hong, Weitao Ma, Xiachong Feng, Xiaocheng Feng, Xiaohui Yan, Yangfan Ye, Yichong Huang, Yunfei Lu, Zhirui Zhang.

**Figure 2.** Figure 2: 2 0 60 80 100 60 80 100 60 80 1000 5 1 5 3 2 2 mance N ance N nce N qwen Win R Win Ra Win Rat [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Layer-wise average results across different LLMs and 20 Acc 10 Accu 0 ns / 0 1 23 45 6 789 10111213 14 1516 1718 19 20 21 222324 2526 27 28 29 30 31 32 Ac 12 3 456 78 9 1011 12 13 14 Dec 5 0 / D 5 0 ains 0 1 1 1 1 111 1 1 1 222 2 2 2 2 2 2 2 3 3 3 Source Layer A 1 11 1 1 Tar 0 ns / D 5 1 23 4 5 67 8 9 10 11 12 13 14 15 16 17 1819 20 21 22 23 2425 26 272829 30 3132 20 Acc 1 234567 8 9 10 11 12 1314 10 [PIT… view at source ↗

**Figure 4.** Figure 4: The layer-wise instance-aware upper bound results across different LLMs and [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 2.** Figure 2: 2 0 60 80 100 60 80 100 60 80 1000 5 1 5 3 2 2 mance N ance N nce N qwen Win R Win Ra Win Rat inTieL nt (Self-Attention) (Self-Attention) Self-Attention)X .3 40 60 80 100 60 80 100 60 80 1000 .1 1 2 .7 .1 1 3 Win [PITH_FULL_IMAGE:figures/full_fig_p015_2.png] view at source ↗

**Figure 8.** Figure 8: A intermediate decoding case study of transplanting the feed forward activations from [PITH_FULL_IMAGE:figures/full_fig_p019_8.png] view at source ↗

**Figure 3.** Figure 3: Layerwise average results across different LLMs and PilotSets. 1 2 34 5 67 8 9 0 1 23 45 6 78901 2 3 4 5 67 89 0 1 2 20 A p 1 2 3 4 5 6 7 8 9 01 2 3 45 67 8 9 0 1 2 3 4 5 6 7 10 (The results of Qwen2-7B-Instruct on XCOPA in Figure 3 show anomalous fluctuationswhich are 20 Ac Filipino 10 Target Layer A Target Layer 20 ains Amharic 0 0 ains Amharic 0 0 s / D Chinese Ahi 5 0 ns / Chinese 0 Source Layer Target… view at source ↗

read the original abstract

Current large language models (LLMs) often exhibit imbalances in multilingual capabilities and cultural adaptability, largely attributed to their English-centric pre-training data. In this paper, we introduce and investigate cross-lingual latent transplantation (XTransplant), a probing framework which aims to further exploit the model's internalized multilingual knowledge during inference and examine its effects on the multilingual capability and cultural adaptability of LLMs. XTransplant framework enables models to harness the complementary strengths of both English and non-English resources by transplanting latent activations across languages. Through extensive analysis, we empirically demonstrate that XTransplant, a form of cross-lingual interaction, has mutually beneficial effects on the multilingual capability and cultural adaptability of LLMs, particularly for low-resource languages and cultures. We further reveal that attention modules play a pivotal role in supporting multilingual understanding, while feed-forward modules are more adept at capturing culture-specific knowledge. In addition, we conduct in-depth analysis of XTransplant's stability, effectiveness, and generalizability. By probing the upper bound performance of XTransplant, we expose the considerable underutilization of current LLMs' multilingual potential-a challenge that remains open. We hope our analysis offers a new lens for advancing cross-lingual interactions and better leveraging models' internalized multilingual knowledge.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

XTransplant reports mutual gains from cross-lingual activation swaps but does not isolate the cause from other factors.

read the letter

The key thing to know is that this paper introduces XTransplant, a method for transplanting latent activations across languages at inference time, and reports that it produces mutual benefits for multilingual performance and cultural adaptability, with particular gains for low-resource languages. They also map attention modules to understanding and feed-forward modules to culture-specific knowledge. The framework is presented as new, and the empirical findings on mutual effects and module roles are the fresh parts. The paper does well in conducting analyses of stability, effectiveness, and generalizability, and in highlighting the underutilization of multilingual potential in current models. That last point is a useful reminder for the field. The main soft spot is the lack of evidence isolating the transplantation as the cause of the gains. The abstract describes mutually beneficial effects from cross-lingual interaction but gives no sign of controls such as same-language transplants, random activation changes, or comparisons to non-transplant baselines. This leaves open the possibility that other factors in the procedure are responsible. The module role analysis is a nice addition but does not address this attribution problem. Because the review is abstract-based, we also cannot assess the actual experimental details or statistical rigor. This paper targets researchers focused on multilingual and cross-cultural aspects of LLMs. A reader who works on activation-based interventions or probing would find the module breakdown and the open challenges section worth their time. It deserves a serious referee because the core idea is testable and relevant to improving global accessibility of LLMs, even though the current writeup needs stronger experimental design to support the claims. I would recommend sending it to peer review. The work raises worthwhile questions about how to better leverage internalized knowledge, and feedback on the methods would help strengthen it.

Referee Report

1 major / 1 minor

Summary. The manuscript introduces XTransplant, a probing framework that transplants latent activations across languages during inference in LLMs. It claims this cross-lingual interaction yields mutually beneficial effects on multilingual capability and cultural adaptability (especially for low-resource languages), identifies attention modules as key for multilingual understanding and FFN modules for culture-specific knowledge, analyzes stability/effectiveness/generalizability, and concludes that current LLMs underutilize their multilingual potential.

Significance. If the reported gains are causally due to transplantation and replicate under controls, the work would offer an inference-time method to exploit internalized multilingual knowledge without retraining, with the module-role findings and upper-bound analysis providing concrete directions for future cross-lingual interaction research.

major comments (1)

[Experimental results and analysis] The central claim that XTransplant produces mutually beneficial effects requires isolating the contribution of cross-lingual activation transplantation from other factors (e.g., changes in activation statistics or inference dynamics). The experimental sections do not describe matched controls such as same-language transplantation, random activation swaps, or frozen-module baselines that would rule out these alternatives.

minor comments (1)

[Introduction / Method] Notation for the transplanted activations and the precise definition of 'mutually beneficial' (e.g., symmetric improvement thresholds) should be formalized earlier to aid reproducibility.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for highlighting the importance of rigorous controls to support the central claims regarding XTransplant's effects. We address the major comment point-by-point below.

read point-by-point responses

Referee: [Experimental results and analysis] The central claim that XTransplant produces mutually beneficial effects requires isolating the contribution of cross-lingual activation transplantation from other factors (e.g., changes in activation statistics or inference dynamics). The experimental sections do not describe matched controls such as same-language transplantation, random activation swaps, or frozen-module baselines that would rule out these alternatives.

Authors: We agree that additional matched controls are needed to more convincingly isolate the contribution of cross-lingual transplantation. In the revised manuscript we will add (1) same-language transplantation baselines, (2) random activation swap controls, and (3) frozen-module ablations where feasible. These will be reported alongside the existing stability, effectiveness, and generalizability analyses to strengthen the causal interpretation of the mutual benefits. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical claims rest on experimental measurements, not self-referential definitions or fitted inputs

full rationale

The paper introduces XTransplant as an empirical probing framework and reports observed performance changes on multilingual and cultural metrics. No equations, derivations, parameter-fitting steps, or self-citation chains appear in the abstract or described structure. Central claims are presented as outcomes of transplantation experiments rather than quantities defined in terms of the inputs themselves. No self-definitional, fitted-prediction, or ansatz-smuggling patterns are present. The work is self-contained against external benchmarks as an observational study.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review is abstract-only, so the ledger is limited to what is stated in the abstract. The central claim rests on the domain assumption that LLMs have already internalized usable multilingual and cultural knowledge in their latent activations.

axioms (1)

domain assumption LLMs internalize multilingual knowledge and cultural adaptability during pre-training that can be accessed and transferred via latent activations
Explicitly stated in the abstract as the motivation and mechanism for XTransplant.

pith-pipeline@v0.9.0 · 5798 in / 1215 out tokens · 66209 ms · 2026-05-23T07:02:16.079774+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

COMPASS: COntinual Multilingual PEFT with Adaptive Semantic Sampling
cs.LG 2026-04 unverdicted novelty 6.0

COMPASS uses semantic clustering on multilingual embeddings to select auxiliary data for PEFT adapters, outperforming linguistic-similarity baselines on multilingual benchmarks while supporting continual adaptation.

Reference graph

Works this paper leans on

61 extracted references · 61 canonical work pages · cited by 1 Pith paper · 9 internal anchors

[2]

Artetxe, S

M. Artetxe, S. Ruder, and D. Yogatama. On the cross-lingual transferability of monolingual representations. In D. Jurafsky, J. Chai, N. Schluter, and J. Tetreault, editors,Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4623–4637, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/...

work page doi:10.18653/v1/2020.acl-main 2020
[3]

URL https://aclanthology.org/2020.acl-main.421

work page 2020
[4]

Biderman, H

S. Biderman, H. Schoelkopf, Q. G. Anthony, H. Bradley, K. O’Brien, E. Hallahan, M. A. Khan, S. Purohit, U. S. Prashanth, E. Raff, et al. Pythia: A suite for analyzing large language models across training and scaling. In International Conference on Machine Learning, pages 2397–2430. PMLR, 2023

work page 2023
[5]

Brown, B

T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, et al. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020

work page 1901
[6]

Cahyawijaya, H

S. Cahyawijaya, H. Lovenia, T. Yu, W. Chung, and P. Fung. InstructAlign: High-and-low resource language alignment via continual crosslingual instruction tuning. In D. Wijaya, A. F. Aji, C. Vania, G. I. Winata, and A. Purwarianti, editors, Proceedings of the First Workshop in South East Asian Language Processing , pages 55–78, Nusa Dua, Bali, Indonesia, Nov

work page
[7]

doi: 10.18653/v1/2023.sealp-1.5

Association for Computational Linguistics. doi: 10.18653/v1/2023.sealp-1.5. URL https://aclanthology.org/2023.sealp-1.5

work page doi:10.18653/v1/2023.sealp-1.5 2023
[8]

P. Chen, S. Ji, N. Bogoychev, A. Kutuzov, B. Haddow, and K. Heafield. Monolingual or multilingual instruction tuning: Which makes a better alpaca. In Y . Graham and M. Purver, editors, Findings of the Association for Computational Linguistics: EACL 2024, pages 1347– 1356, St. Julian’s, Malta, Mar. 2024. Association for Computational Linguistics. URL https...

work page 2024
[9]

Z. Chen, F. Jiang, J. Chen, T. Wang, F. Yu, G. Chen, H. Zhang, J. Liang, C. Zhang, Z. Zhang, et al. Phoenix: Democratizing chatgpt across languages. arXiv preprint arXiv:2304.10453, 2023

work page arXiv 2023
[10]

Conneau and G

A. Conneau and G. Lample. Cross-lingual language model pretraining. Advances in neural information processing systems, 32, 2019

work page 2019
[11]

Conneau, R

A. Conneau, R. Rinott, G. Lample, A. Williams, S. R. Bowman, H. Schwenk, and V . Stoyanov. Xnli: Evaluating cross-lingual sentence representations. In Proceedings of the 2018 Confer- ence on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2018

work page 2018
[12]

Conneau, K

A. Conneau, K. Khandelwal, N. Goyal, V . Chaudhary, G. Wenzek, F. Guzmán, E. Grave, M. Ott, L. Zettlemoyer, and V . Stoyanov. Unsupervised cross-lingual representation learning at scale. In D. Jurafsky, J. Chai, N. Schluter, and J. Tetreault, editors,Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics , pages 8440–8451,...

work page
[13]

Unsupervised Cross-lingual Representation Learning at Scale , booktitle =

Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.747. URL https://aclanthology.org/2020.acl-main.747

work page doi:10.18653/v1/2020.acl-main.747 2020
[14]

Y . Cui, Z. Yang, and X. Yao. Efficient and effective text encoding for chinese llama and alpaca. arXiv preprint arXiv:2304.08177, 2023

work page arXiv 2023
[15]

BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. In J. Burstein, C. Doran, and T. Solorio, editors, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Pa- p...

work page doi:10.18653/v1/n19-1423 2019
[16]

Q. Dong, L. Li, D. Dai, C. Zheng, Z. Wu, B. Chang, X. Sun, J. Xu, and Z. Sui. A survey for in-context learning. ArXiv preprint, abs/2301.00234, 2023. URL https://arxiv.org/abs/ 2301.00234

work page internal anchor Pith review Pith/arXiv arXiv 2023
[17]

Towards Measuring the Representation of Subjective Global Opinions in Language Models

E. Durmus, K. Nguyen, T. I. Liao, N. Schiefer, A. Askell, A. Bakhtin, C. Chen, Z. Hatfield- Dodds, D. Hernandez, N. Joseph, et al. Towards measuring the representation of subjective global opinions in language models. arXiv preprint arXiv:2306.16388, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[18]

C. Gao, H. Hu, P. Hu, J. Chen, J. Li, and S. Huang. Multilingual pretraining and instruc- tion tuning improve cross-lingual knowledge alignment, but only shallowly. arXiv preprint arXiv:2404.04659, 2024

work page arXiv 2024
[19]

Chinese-mixtral-8x7b: An open-source mixture-of-experts llm

HIT-SCIR. Chinese-mixtral-8x7b: An open-source mixture-of-experts llm. https://github. com/HIT-SCIR/Chinese-Mixtral-8x7B , 2024

work page 2024
[20]

S. R. Indurthi, W. Zhou, S. Chollampatt, R. Agrawal, K. Song, L. Zhao, and C. Zhu. Improving multilingual instruction finetuning via linguistically natural and diverse datasets. In Y . Al- Onaizan, M. Bansal, and Y .-N. Chen, editors,Findings of the Association for Computational Linguistics: EMNLP 2024, pages 2306–2323, Miami, Florida, USA, Nov. 2024. Ass...

work page doi:10.18653/v1/2024.findings-emnlp.128 2024
[22]

A. Q. Jiang, A. Sablayrolles, A. Mensch, C. Bamford, D. S. Chaplot, D. de las Casas, F. Bressand, G. Lengyel, G. Lample, L. Saulnier, L. R. Lavaud, M.-A. Lachaux, P. Stock, T. L. Scao, T. Lavril, T. Wang, T. Lacroix, and W. E. Sayed. Mistral 7b, 2023. URL https://arxiv.org/abs/ 2310.06825

work page internal anchor Pith review Pith/arXiv arXiv 2023
[23]

T. Kew, F. Schottmann, and R. Sennrich. Turning English-centric LLMs into polyglots: How much multilinguality is needed? In Y . Al-Onaizan, M. Bansal, and Y .-N. Chen, editors,Findings of the Association for Computational Linguistics: EMNLP 2024, pages 13097–13124, Miami, Florida, USA, Nov. 2024. Association for Computational Linguistics. doi: 10.18653/v1...

work page doi:10.18653/v1/2024 2024
[24]

Khurana, N

S. Khurana, N. Dawalatabad, A. Laurent, L. Vicente, P. Gimeno, V . Mingote, and J. Glass. Cross-lingual transfer learning for low-resource speech translation. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024

work page 2024
[25]

Kojima, I

T. Kojima, I. Okimura, Y . Iwasawa, H. Yanaka, and Y . Matsuo. On the multilingual ability of decoder-based pre-trained language models: Finding and controlling language-specific neurons. arXiv preprint arXiv:2404.02431, 2024

work page arXiv 2024
[26]

Kovaˇc, M

G. Kovaˇc, M. Sawayama, R. Portelas, C. Colas, P. F. Dominey, and P.-Y . Oudeyer. Large language models as superpositions of cultural perspectives. arXiv preprint arXiv:2307.07870, 2023

work page arXiv 2023
[27]

C. Li, M. Chen, J. Wang, S. Sitaram, and X. Xie. Culturellm: Incorporating cultural differences into large language models. arXiv preprint arXiv:2402.10946, 2024

work page arXiv 2024
[28]

J. Li, S. Huang, X. Dai, and J. Chen. Prealign: Boosting cross-lingual transfer by early establishment of multilingual alignment. arXiv preprint arXiv:2407.16222, 2024

work page arXiv 2024
[29]

P. Lin, S. Ji, J. Tiedemann, A. F. Martins, and H. Schütze. Mala-500: Massive language adaptation of large language models. arXiv preprint arXiv:2401.13303, 2024

work page arXiv 2024
[30]

X. V . Lin, T. Mihaylov, M. Artetxe, T. Wang, S. Chen, D. Simig, M. Ott, N. Goyal, S. Bhos- ale, J. Du, et al. Few-shot learning with multilingual language models. arXiv preprint arXiv:2112.10668, 2021. 11

work page arXiv 2021
[31]

Lin and Y .-N

Y .-T. Lin and Y .-N. Chen. Taiwan llm: Bridging the linguistic divide with a culturally aligned language model. arXiv preprint arXiv:2311.17487, 2023

work page arXiv 2023
[32]

P. Liu, W. Yuan, J. Fu, Z. Jiang, H. Hayashi, and G. Neubig. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9):1–35, 2023

work page 2023
[33]

Crosslingual generalization through multitask ﬁnetuning

N. Muennighoff, T. Wang, L. Sutawika, A. Roberts, S. Biderman, T. L. Scao, M. S. Bari, S. Shen, Z.-X. Yong, H. Schoelkopf, et al. Crosslingual generalization through multitask finetuning. arXiv preprint arXiv:2211.01786, 2022

work page arXiv 2022
[34]

Nguyen, W

X.-P. Nguyen, W. Zhang, X. Li, M. Aljunied, Z. Hu, C. Shen, Y . K. Chia, X. Li, J. Wang, Q. Tan, L. Cheng, G. Chen, Y . Deng, S. Yang, C. Liu, H. Zhang, and L. Bing. SeaLLMs - large language models for Southeast Asia. In Y . Cao, Y . Feng, and D. Xiong, editors,Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume...

work page 2024
[35]

Pires, H

R. Pires, H. Abonizio, T. S. Almeida, and R. Nogueira. Sabiá: Portuguese large language models. In Brazilian Conference on Intelligent Systems, pages 226–240. Springer, 2023

work page 2023
[36]

L. Qin, Q. Chen, Y . Zhou, Z. Chen, Y . Li, L. Liao, M. Li, W. Che, and P. S. Yu. Multilin- gual large language model: A survey of resources, taxonomy and frontiers. arXiv preprint arXiv:2404.04925, 2024

work page arXiv 2024
[37]

Yifan Li, Yifan Du, Kun Zhou, Jinpeng Wang, Xin Zhao, and Ji-Rong Wen

A. Ramezani and Y . Xu. Knowledge of cultural moral norms in large language models. In A. Rogers, J. Boyd-Graber, and N. Okazaki, editors, Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages 428–446, Toronto, Canada, July 2023. Association for Computational Linguistics. doi: 10.18653/v1/2...

work page doi:10.18653/v1/2023 2023
[38]

A. Rao, A. Yerukola, V . Shah, K. Reinecke, and M. Sap. Normad: A benchmark for measuring the cultural adaptability of large language models. arXiv preprint arXiv:2404.12464, 2024

work page arXiv 2024
[39]

A. S. Rao, A. Khandelwal, K. Tanmay, U. Agarwal, and M. Choudhury. Ethical reason- ing over moral alignment: A case and framework for in-context ethical policies in LLMs. In H. Bouamor, J. Pino, and K. Bali, editors, Findings of the Association for Compu- tational Linguistics: EMNLP 2023 , pages 13370–13388, Singapore, Dec. 2023. Associ- ation for Computa...

work page doi:10.18653/v1/2023.findings-emnlp.892 2023
[40]

Reid and M

M. Reid and M. Artetxe. On the role of parallel data in cross-lingual transfer learning. arXiv preprint arXiv:2212.10173, 2022

work page arXiv 2022
[41]

T. L. Scao, A. Fan, C. Akiki, E. Pavlick, S. Ili ´c, D. Hesslow, R. Castagné, A. S. Luccioni, F. Yvon, M. Gallé, et al. Bloom: A 176b-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[42]

Shanahan

M. Shanahan. Talking about large language models. ArXiv preprint, abs/2212.03551, 2022. URL https://arxiv.org/abs/2212.03551

work page arXiv 2022
[43]

W. Shi, R. Li, Y . Zhang, C. Ziems, R. Horesh, R. A. de Paula, D. Yang, et al. Culturebank: An online community-driven knowledge base towards culturally aware language technologies. arXiv preprint arXiv:2404.15238, 2024

work page arXiv 2024
[44]

Singh, F

S. Singh, F. Vargus, D. Dsouza, B. F. Karlsson, A. Mahendiran, W.-Y . Ko, H. Shandilya, J. Patel, D. Mataciunas, L. OMahony, M. Zhang, R. Hettiarachchi, J. Wilson, M. Machado, L. S. Moura, D. Krzemi´nski, H. Fadaei, I. Ergün, I. Okoh, A. Alaagib, O. Mudannayake, Z. Alyafeai, V . M. Chien, S. Ruder, S. Guthikonda, E. A. Alghamdi, S. Gehrmann, N. Muennighof...

work page 2024
[45]

T. Tang, W. Luo, H. Huang, D. Zhang, X. Wang, X. Zhao, F. Wei, and J.-R. Wen. Language- specific neurons: The key to multilingual capabilities in large language models. arXiv preprint arXiv:2402.16438, 2024

work page arXiv 2024
[46]

LLaMA: Open and Efficient Foundation Language Models

H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozière, N. Goyal, E. Hambro, F. Azhar, et al. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[47]

W. Wang, W. Jiao, J. Huang, R. Dai, J.-t. Huang, Z. Tu, and M. Lyu. Not all countries celebrate thanksgiving: On the cultural dominance in large language models. In L.-W. Ku, A. Martins, and V . Srikumar, editors, Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6349–6384, Bangkok, Thai...

work page 2024
[48]

Z. Wang, Z. C. Lipton, and Y . Tsvetkov. On negative interference in multilingual models: Findings and a meta-learning treatment. In B. Webber, T. Cohn, Y . He, and Y . Liu, editors, Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 4438–4450, Online, Nov. 2020. Association for Computational Linguis- tic...

work page doi:10.18653/v1/2020.emnlp-main.359 2020
[49]

J. Wei, Y . Tay, R. Bommasani, C. Raffel, B. Zoph, S. Borgeaud, D. Yogatama, M. Bosma, D. Zhou, D. Metzler, et al. Emergent abilities of large language models. ArXiv preprint, abs/2206.07682, 2022. URL https://arxiv.org/abs/2206.07682

work page internal anchor Pith review Pith/arXiv arXiv 2022
[50]

J. Wei, X. Wang, D. Schuurmans, M. Bosma, F. Xia, E. Chi, Q. V . Le, D. Zhou, et al. Chain-of- thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837, 2022

work page 2022
[51]

A. Yang, B. Yang, B. Hui, B. Zheng, B. Yu, C. Zhou, C. Li, C. Li, D. Liu, F. Huang, G. Dong, H. Wei, H. Lin, J. Tang, J. Wang, J. Yang, J. Tu, J. Zhang, J. Ma, J. Yang, J. Xu, J. Zhou, J. Bai, J. He, J. Lin, K. Dang, K. Lu, K. Chen, K. Yang, M. Li, M. Xue, N. Ni, P. Zhang, P. Wang, R. Peng, R. Men, R. Gao, R. Lin, S. Wang, S. Bai, S. Tan, T. Zhu, T. Li, T...

work page internal anchor Pith review Pith/arXiv arXiv 2024
[52]

J. Ye, X. Tao, and L. Kong. Language versatilists vs. specialists: An empirical revisiting on multilingual transfer ability. arXiv preprint arXiv:2306.06688, 2023

work page arXiv 2023
[53]

Y . Ye, X. Feng, X. Feng, W. Ma, L. Qin, D. Xu, Q. Yang, H. Liu, and B. Qin. GlobeSumm: A challenging benchmark towards unifying multi-lingual, cross-lingual and multi-document news summarization. In Y . Al-Onaizan, M. Bansal, and Y .-N. Chen, editors,Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 10803–10821...

work page 2024
[54]

OPT: Open Pre-trained Transformer Language Models

S. Zhang, S. Roller, N. Goyal, M. Artetxe, M. Chen, S. Chen, C. Dewan, M. Diab, X. Li, X. V . Lin, et al. Opt: Open pre-trained transformer language models.arXiv preprint arXiv:2205.01068, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[55]

W. X. Zhao, K. Zhou, J. Li, T. Tang, X. Wang, Y . Hou, Y . Min, B. Zhang, J. Zhang, Z. Dong, et al. A survey of large language models. ArXiv preprint, abs/2303.18223, 2023. URL https://arxiv.org/abs/2303.18223

work page internal anchor Pith review Pith/arXiv arXiv 2023
[56]

Y . Zhao, W. Zhang, G. Chen, K. Kawaguchi, and L. Bing. How do large language models handle multilingualism? arXiv preprint arXiv:2402.18815, 2024

work page arXiv 2024
[59]

Win, Tie, Lose

URLhttps://aclanthology.org/2020.acl-main.421 .279 9 llama Datasets XNLI 30.1 33.2 30.5 XQuAD 33.5 31.3 31.9 Global OpinionQA 32.1 25.8 29.0 mistral Datasets XNLI 37.7 40.3 38.0 XQuAD 39.8 35.9 39.9 Global OpinionQA 68.3 66.4 66.5 qwen Datasets XNLI 55.2 55.2 54.4 XQuAD 47.3 44.2 45.5 Global OpinionQA 64.2 62.5 62.4 V anilla Self-Attention Feed-Forward Pe...

work page 2020
[60]

M. A. Abbasi, A. Ghafouri, M. Firouzmandi, H. Naderi, and B. M. Bidgoli. Persianllama:273 Towards building ﬁrst persian large language model. arXiv preprint arXiv:2312.15713 , 2023.274

work page arXiv 2023
[61]

Artetxe, S

M. Artetxe, S. Ruder, and D. Y ogatama. On the cross-lingual transferability of monolingual275 representations. In D. Jurafsky, J. Chai, N. Schluter, and J. Tetreault, editors, Proceedings of276 the 58th Annual Meeting of the Association for Computational Linguistics , pages 4623–4637,277 Online, July 2020. Association for Computational Linguistics. doi: ...

work page doi:10.18653/v1/2020.acl-main.278 2020
[62]

URL https://aclanthology.org/2020.acl-main.421 .279

work page 2020
[63]

Biderman, H

S. Biderman, H. Schoelkopf, Q. G. Anthony, H. Bradley, K. O’Brien, E. Hallahan, M. A.280 Khan, S. Purohit, U. S. Prashanth, E. Raff, et al. Pythia: A suite for analyzing large language281 models across training and scaling. In International Conference on Machine Learning , pages282 2397–2430. PMLR, 2023.283

work page 2023
[64]

Brown, B

T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P . Dhariwal, A. Neelakantan, P . Shyam,284 G. Sastry, A. Askell, et al. Language models are few-shot learners. Advances in neural285 information processing systems , 33:1877–1901, 2020.286

work page 1901
[65]

Cahyawijaya, H

S. Cahyawijaya, H. Lovenia, T. Y u, W. Chung, and P . Fung. InstructAlign: High-and-low287 resource language alignment via continual crosslingual instruction tuning. In D. Wijaya, A. F.288 10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 10 0 10 20 30 Source Layer Accuracy Gains / Declines Average source layer resu...

work page

[1] [2]

Artetxe, S

M. Artetxe, S. Ruder, and D. Yogatama. On the cross-lingual transferability of monolingual representations. In D. Jurafsky, J. Chai, N. Schluter, and J. Tetreault, editors,Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4623–4637, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/...

work page doi:10.18653/v1/2020.acl-main 2020

[2] [3]

URL https://aclanthology.org/2020.acl-main.421

work page 2020

[3] [4]

Biderman, H

S. Biderman, H. Schoelkopf, Q. G. Anthony, H. Bradley, K. O’Brien, E. Hallahan, M. A. Khan, S. Purohit, U. S. Prashanth, E. Raff, et al. Pythia: A suite for analyzing large language models across training and scaling. In International Conference on Machine Learning, pages 2397–2430. PMLR, 2023

work page 2023

[4] [5]

Brown, B

T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, et al. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020

work page 1901

[5] [6]

Cahyawijaya, H

S. Cahyawijaya, H. Lovenia, T. Yu, W. Chung, and P. Fung. InstructAlign: High-and-low resource language alignment via continual crosslingual instruction tuning. In D. Wijaya, A. F. Aji, C. Vania, G. I. Winata, and A. Purwarianti, editors, Proceedings of the First Workshop in South East Asian Language Processing , pages 55–78, Nusa Dua, Bali, Indonesia, Nov

work page

[6] [7]

doi: 10.18653/v1/2023.sealp-1.5

Association for Computational Linguistics. doi: 10.18653/v1/2023.sealp-1.5. URL https://aclanthology.org/2023.sealp-1.5

work page doi:10.18653/v1/2023.sealp-1.5 2023

[7] [8]

P. Chen, S. Ji, N. Bogoychev, A. Kutuzov, B. Haddow, and K. Heafield. Monolingual or multilingual instruction tuning: Which makes a better alpaca. In Y . Graham and M. Purver, editors, Findings of the Association for Computational Linguistics: EACL 2024, pages 1347– 1356, St. Julian’s, Malta, Mar. 2024. Association for Computational Linguistics. URL https...

work page 2024

[8] [9]

Z. Chen, F. Jiang, J. Chen, T. Wang, F. Yu, G. Chen, H. Zhang, J. Liang, C. Zhang, Z. Zhang, et al. Phoenix: Democratizing chatgpt across languages. arXiv preprint arXiv:2304.10453, 2023

work page arXiv 2023

[9] [10]

Conneau and G

A. Conneau and G. Lample. Cross-lingual language model pretraining. Advances in neural information processing systems, 32, 2019

work page 2019

[10] [11]

Conneau, R

A. Conneau, R. Rinott, G. Lample, A. Williams, S. R. Bowman, H. Schwenk, and V . Stoyanov. Xnli: Evaluating cross-lingual sentence representations. In Proceedings of the 2018 Confer- ence on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2018

work page 2018

[11] [12]

Conneau, K

A. Conneau, K. Khandelwal, N. Goyal, V . Chaudhary, G. Wenzek, F. Guzmán, E. Grave, M. Ott, L. Zettlemoyer, and V . Stoyanov. Unsupervised cross-lingual representation learning at scale. In D. Jurafsky, J. Chai, N. Schluter, and J. Tetreault, editors,Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics , pages 8440–8451,...

work page

[12] [13]

Unsupervised Cross-lingual Representation Learning at Scale , booktitle =

Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.747. URL https://aclanthology.org/2020.acl-main.747

work page doi:10.18653/v1/2020.acl-main.747 2020

[13] [14]

Y . Cui, Z. Yang, and X. Yao. Efficient and effective text encoding for chinese llama and alpaca. arXiv preprint arXiv:2304.08177, 2023

work page arXiv 2023

[14] [15]

BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. In J. Burstein, C. Doran, and T. Solorio, editors, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Pa- p...

work page doi:10.18653/v1/n19-1423 2019

[15] [16]

Q. Dong, L. Li, D. Dai, C. Zheng, Z. Wu, B. Chang, X. Sun, J. Xu, and Z. Sui. A survey for in-context learning. ArXiv preprint, abs/2301.00234, 2023. URL https://arxiv.org/abs/ 2301.00234

work page internal anchor Pith review Pith/arXiv arXiv 2023

[16] [17]

Towards Measuring the Representation of Subjective Global Opinions in Language Models

E. Durmus, K. Nguyen, T. I. Liao, N. Schiefer, A. Askell, A. Bakhtin, C. Chen, Z. Hatfield- Dodds, D. Hernandez, N. Joseph, et al. Towards measuring the representation of subjective global opinions in language models. arXiv preprint arXiv:2306.16388, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[17] [18]

C. Gao, H. Hu, P. Hu, J. Chen, J. Li, and S. Huang. Multilingual pretraining and instruc- tion tuning improve cross-lingual knowledge alignment, but only shallowly. arXiv preprint arXiv:2404.04659, 2024

work page arXiv 2024

[18] [19]

Chinese-mixtral-8x7b: An open-source mixture-of-experts llm

HIT-SCIR. Chinese-mixtral-8x7b: An open-source mixture-of-experts llm. https://github. com/HIT-SCIR/Chinese-Mixtral-8x7B , 2024

work page 2024

[19] [20]

S. R. Indurthi, W. Zhou, S. Chollampatt, R. Agrawal, K. Song, L. Zhao, and C. Zhu. Improving multilingual instruction finetuning via linguistically natural and diverse datasets. In Y . Al- Onaizan, M. Bansal, and Y .-N. Chen, editors,Findings of the Association for Computational Linguistics: EMNLP 2024, pages 2306–2323, Miami, Florida, USA, Nov. 2024. Ass...

work page doi:10.18653/v1/2024.findings-emnlp.128 2024

[20] [22]

A. Q. Jiang, A. Sablayrolles, A. Mensch, C. Bamford, D. S. Chaplot, D. de las Casas, F. Bressand, G. Lengyel, G. Lample, L. Saulnier, L. R. Lavaud, M.-A. Lachaux, P. Stock, T. L. Scao, T. Lavril, T. Wang, T. Lacroix, and W. E. Sayed. Mistral 7b, 2023. URL https://arxiv.org/abs/ 2310.06825

work page internal anchor Pith review Pith/arXiv arXiv 2023

[21] [23]

T. Kew, F. Schottmann, and R. Sennrich. Turning English-centric LLMs into polyglots: How much multilinguality is needed? In Y . Al-Onaizan, M. Bansal, and Y .-N. Chen, editors,Findings of the Association for Computational Linguistics: EMNLP 2024, pages 13097–13124, Miami, Florida, USA, Nov. 2024. Association for Computational Linguistics. doi: 10.18653/v1...

work page doi:10.18653/v1/2024 2024

[22] [24]

Khurana, N

S. Khurana, N. Dawalatabad, A. Laurent, L. Vicente, P. Gimeno, V . Mingote, and J. Glass. Cross-lingual transfer learning for low-resource speech translation. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024

work page 2024

[23] [25]

Kojima, I

T. Kojima, I. Okimura, Y . Iwasawa, H. Yanaka, and Y . Matsuo. On the multilingual ability of decoder-based pre-trained language models: Finding and controlling language-specific neurons. arXiv preprint arXiv:2404.02431, 2024

work page arXiv 2024

[24] [26]

Kovaˇc, M

G. Kovaˇc, M. Sawayama, R. Portelas, C. Colas, P. F. Dominey, and P.-Y . Oudeyer. Large language models as superpositions of cultural perspectives. arXiv preprint arXiv:2307.07870, 2023

work page arXiv 2023

[25] [27]

C. Li, M. Chen, J. Wang, S. Sitaram, and X. Xie. Culturellm: Incorporating cultural differences into large language models. arXiv preprint arXiv:2402.10946, 2024

work page arXiv 2024

[26] [28]

J. Li, S. Huang, X. Dai, and J. Chen. Prealign: Boosting cross-lingual transfer by early establishment of multilingual alignment. arXiv preprint arXiv:2407.16222, 2024

work page arXiv 2024

[27] [29]

P. Lin, S. Ji, J. Tiedemann, A. F. Martins, and H. Schütze. Mala-500: Massive language adaptation of large language models. arXiv preprint arXiv:2401.13303, 2024

work page arXiv 2024

[28] [30]

X. V . Lin, T. Mihaylov, M. Artetxe, T. Wang, S. Chen, D. Simig, M. Ott, N. Goyal, S. Bhos- ale, J. Du, et al. Few-shot learning with multilingual language models. arXiv preprint arXiv:2112.10668, 2021. 11

work page arXiv 2021

[29] [31]

Lin and Y .-N

Y .-T. Lin and Y .-N. Chen. Taiwan llm: Bridging the linguistic divide with a culturally aligned language model. arXiv preprint arXiv:2311.17487, 2023

work page arXiv 2023

[30] [32]

P. Liu, W. Yuan, J. Fu, Z. Jiang, H. Hayashi, and G. Neubig. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9):1–35, 2023

work page 2023

[31] [33]

Crosslingual generalization through multitask ﬁnetuning

N. Muennighoff, T. Wang, L. Sutawika, A. Roberts, S. Biderman, T. L. Scao, M. S. Bari, S. Shen, Z.-X. Yong, H. Schoelkopf, et al. Crosslingual generalization through multitask finetuning. arXiv preprint arXiv:2211.01786, 2022

work page arXiv 2022

[32] [34]

Nguyen, W

X.-P. Nguyen, W. Zhang, X. Li, M. Aljunied, Z. Hu, C. Shen, Y . K. Chia, X. Li, J. Wang, Q. Tan, L. Cheng, G. Chen, Y . Deng, S. Yang, C. Liu, H. Zhang, and L. Bing. SeaLLMs - large language models for Southeast Asia. In Y . Cao, Y . Feng, and D. Xiong, editors,Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume...

work page 2024

[33] [35]

Pires, H

R. Pires, H. Abonizio, T. S. Almeida, and R. Nogueira. Sabiá: Portuguese large language models. In Brazilian Conference on Intelligent Systems, pages 226–240. Springer, 2023

work page 2023

[34] [36]

L. Qin, Q. Chen, Y . Zhou, Z. Chen, Y . Li, L. Liao, M. Li, W. Che, and P. S. Yu. Multilin- gual large language model: A survey of resources, taxonomy and frontiers. arXiv preprint arXiv:2404.04925, 2024

work page arXiv 2024

[35] [37]

Yifan Li, Yifan Du, Kun Zhou, Jinpeng Wang, Xin Zhao, and Ji-Rong Wen

A. Ramezani and Y . Xu. Knowledge of cultural moral norms in large language models. In A. Rogers, J. Boyd-Graber, and N. Okazaki, editors, Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages 428–446, Toronto, Canada, July 2023. Association for Computational Linguistics. doi: 10.18653/v1/2...

work page doi:10.18653/v1/2023 2023

[36] [38]

A. Rao, A. Yerukola, V . Shah, K. Reinecke, and M. Sap. Normad: A benchmark for measuring the cultural adaptability of large language models. arXiv preprint arXiv:2404.12464, 2024

work page arXiv 2024

[37] [39]

A. S. Rao, A. Khandelwal, K. Tanmay, U. Agarwal, and M. Choudhury. Ethical reason- ing over moral alignment: A case and framework for in-context ethical policies in LLMs. In H. Bouamor, J. Pino, and K. Bali, editors, Findings of the Association for Compu- tational Linguistics: EMNLP 2023 , pages 13370–13388, Singapore, Dec. 2023. Associ- ation for Computa...

work page doi:10.18653/v1/2023.findings-emnlp.892 2023

[38] [40]

Reid and M

M. Reid and M. Artetxe. On the role of parallel data in cross-lingual transfer learning. arXiv preprint arXiv:2212.10173, 2022

work page arXiv 2022

[39] [41]

T. L. Scao, A. Fan, C. Akiki, E. Pavlick, S. Ili ´c, D. Hesslow, R. Castagné, A. S. Luccioni, F. Yvon, M. Gallé, et al. Bloom: A 176b-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[40] [42]

Shanahan

M. Shanahan. Talking about large language models. ArXiv preprint, abs/2212.03551, 2022. URL https://arxiv.org/abs/2212.03551

work page arXiv 2022

[41] [43]

W. Shi, R. Li, Y . Zhang, C. Ziems, R. Horesh, R. A. de Paula, D. Yang, et al. Culturebank: An online community-driven knowledge base towards culturally aware language technologies. arXiv preprint arXiv:2404.15238, 2024

work page arXiv 2024

[42] [44]

Singh, F

S. Singh, F. Vargus, D. Dsouza, B. F. Karlsson, A. Mahendiran, W.-Y . Ko, H. Shandilya, J. Patel, D. Mataciunas, L. OMahony, M. Zhang, R. Hettiarachchi, J. Wilson, M. Machado, L. S. Moura, D. Krzemi´nski, H. Fadaei, I. Ergün, I. Okoh, A. Alaagib, O. Mudannayake, Z. Alyafeai, V . M. Chien, S. Ruder, S. Guthikonda, E. A. Alghamdi, S. Gehrmann, N. Muennighof...

work page 2024

[43] [45]

T. Tang, W. Luo, H. Huang, D. Zhang, X. Wang, X. Zhao, F. Wei, and J.-R. Wen. Language- specific neurons: The key to multilingual capabilities in large language models. arXiv preprint arXiv:2402.16438, 2024

work page arXiv 2024

[44] [46]

LLaMA: Open and Efficient Foundation Language Models

H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozière, N. Goyal, E. Hambro, F. Azhar, et al. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[45] [47]

W. Wang, W. Jiao, J. Huang, R. Dai, J.-t. Huang, Z. Tu, and M. Lyu. Not all countries celebrate thanksgiving: On the cultural dominance in large language models. In L.-W. Ku, A. Martins, and V . Srikumar, editors, Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6349–6384, Bangkok, Thai...

work page 2024

[46] [48]

Z. Wang, Z. C. Lipton, and Y . Tsvetkov. On negative interference in multilingual models: Findings and a meta-learning treatment. In B. Webber, T. Cohn, Y . He, and Y . Liu, editors, Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 4438–4450, Online, Nov. 2020. Association for Computational Linguis- tic...

work page doi:10.18653/v1/2020.emnlp-main.359 2020

[47] [49]

J. Wei, Y . Tay, R. Bommasani, C. Raffel, B. Zoph, S. Borgeaud, D. Yogatama, M. Bosma, D. Zhou, D. Metzler, et al. Emergent abilities of large language models. ArXiv preprint, abs/2206.07682, 2022. URL https://arxiv.org/abs/2206.07682

work page internal anchor Pith review Pith/arXiv arXiv 2022

[48] [50]

J. Wei, X. Wang, D. Schuurmans, M. Bosma, F. Xia, E. Chi, Q. V . Le, D. Zhou, et al. Chain-of- thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837, 2022

work page 2022

[49] [51]

A. Yang, B. Yang, B. Hui, B. Zheng, B. Yu, C. Zhou, C. Li, C. Li, D. Liu, F. Huang, G. Dong, H. Wei, H. Lin, J. Tang, J. Wang, J. Yang, J. Tu, J. Zhang, J. Ma, J. Yang, J. Xu, J. Zhou, J. Bai, J. He, J. Lin, K. Dang, K. Lu, K. Chen, K. Yang, M. Li, M. Xue, N. Ni, P. Zhang, P. Wang, R. Peng, R. Men, R. Gao, R. Lin, S. Wang, S. Bai, S. Tan, T. Zhu, T. Li, T...

work page internal anchor Pith review Pith/arXiv arXiv 2024

[50] [52]

J. Ye, X. Tao, and L. Kong. Language versatilists vs. specialists: An empirical revisiting on multilingual transfer ability. arXiv preprint arXiv:2306.06688, 2023

work page arXiv 2023

[51] [53]

Y . Ye, X. Feng, X. Feng, W. Ma, L. Qin, D. Xu, Q. Yang, H. Liu, and B. Qin. GlobeSumm: A challenging benchmark towards unifying multi-lingual, cross-lingual and multi-document news summarization. In Y . Al-Onaizan, M. Bansal, and Y .-N. Chen, editors,Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 10803–10821...

work page 2024

[52] [54]

OPT: Open Pre-trained Transformer Language Models

S. Zhang, S. Roller, N. Goyal, M. Artetxe, M. Chen, S. Chen, C. Dewan, M. Diab, X. Li, X. V . Lin, et al. Opt: Open pre-trained transformer language models.arXiv preprint arXiv:2205.01068, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[53] [55]

W. X. Zhao, K. Zhou, J. Li, T. Tang, X. Wang, Y . Hou, Y . Min, B. Zhang, J. Zhang, Z. Dong, et al. A survey of large language models. ArXiv preprint, abs/2303.18223, 2023. URL https://arxiv.org/abs/2303.18223

work page internal anchor Pith review Pith/arXiv arXiv 2023

[54] [56]

Y . Zhao, W. Zhang, G. Chen, K. Kawaguchi, and L. Bing. How do large language models handle multilingualism? arXiv preprint arXiv:2402.18815, 2024

work page arXiv 2024

[55] [59]

Win, Tie, Lose

URLhttps://aclanthology.org/2020.acl-main.421 .279 9 llama Datasets XNLI 30.1 33.2 30.5 XQuAD 33.5 31.3 31.9 Global OpinionQA 32.1 25.8 29.0 mistral Datasets XNLI 37.7 40.3 38.0 XQuAD 39.8 35.9 39.9 Global OpinionQA 68.3 66.4 66.5 qwen Datasets XNLI 55.2 55.2 54.4 XQuAD 47.3 44.2 45.5 Global OpinionQA 64.2 62.5 62.4 V anilla Self-Attention Feed-Forward Pe...

work page 2020

[56] [60]

M. A. Abbasi, A. Ghafouri, M. Firouzmandi, H. Naderi, and B. M. Bidgoli. Persianllama:273 Towards building ﬁrst persian large language model. arXiv preprint arXiv:2312.15713 , 2023.274

work page arXiv 2023

[57] [61]

Artetxe, S

M. Artetxe, S. Ruder, and D. Y ogatama. On the cross-lingual transferability of monolingual275 representations. In D. Jurafsky, J. Chai, N. Schluter, and J. Tetreault, editors, Proceedings of276 the 58th Annual Meeting of the Association for Computational Linguistics , pages 4623–4637,277 Online, July 2020. Association for Computational Linguistics. doi: ...

work page doi:10.18653/v1/2020.acl-main.278 2020

[58] [62]

URL https://aclanthology.org/2020.acl-main.421 .279

work page 2020

[59] [63]

Biderman, H

S. Biderman, H. Schoelkopf, Q. G. Anthony, H. Bradley, K. O’Brien, E. Hallahan, M. A.280 Khan, S. Purohit, U. S. Prashanth, E. Raff, et al. Pythia: A suite for analyzing large language281 models across training and scaling. In International Conference on Machine Learning , pages282 2397–2430. PMLR, 2023.283

work page 2023

[60] [64]

Brown, B

T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P . Dhariwal, A. Neelakantan, P . Shyam,284 G. Sastry, A. Askell, et al. Language models are few-shot learners. Advances in neural285 information processing systems , 33:1877–1901, 2020.286

work page 1901

[61] [65]

Cahyawijaya, H

S. Cahyawijaya, H. Lovenia, T. Y u, W. Chung, and P . Fung. InstructAlign: High-and-low287 resource language alignment via continual crosslingual instruction tuning. In D. Wijaya, A. F.288 10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 10 0 10 20 30 Source Layer Accuracy Gains / Declines Average source layer resu...

work page