pith. sign in

arxiv: 2412.12686 · v3 · submitted 2024-12-17 · 💻 cs.CL

Exploring Cross-lingual Latent Transplantation: Mutual Opportunities and Open Challenges

Pith reviewed 2026-05-23 07:02 UTC · model grok-4.3

classification 💻 cs.CL
keywords cross-lingual latent transplantationmultilingual LLMscultural adaptabilitylatent activationslow-resource languagesattention modulesfeed-forward modulesXTransplant
0
0 comments X

The pith

Cross-lingual latent transplantation improves multilingual capability and cultural adaptability in LLMs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces XTransplant, a probing framework that transplants latent activations across languages during inference to let models combine complementary strengths from English and non-English resources. This form of cross-lingual interaction produces mutual gains in multilingual task performance and cultural adaptability, with larger benefits for low-resource languages and cultures. The authors further show that attention modules drive multilingual understanding while feed-forward modules handle culture-specific knowledge. Experiments also indicate that current LLMs leave substantial multilingual potential unused.

Core claim

XTransplant is a probing framework that transplants latent activations across languages to harness complementary strengths of English and non-English resources. Empirical analysis shows this cross-lingual interaction has mutually beneficial effects on multilingual capability and cultural adaptability of LLMs, particularly for low-resource languages and cultures. Attention modules play a pivotal role in multilingual understanding, while feed-forward modules capture culture-specific knowledge. The work exposes considerable underutilization of current LLMs' multilingual potential.

What carries the argument

XTransplant framework that transplants latent activations across languages during inference.

If this is right

  • XTransplant yields mutual improvements in multilingual capability for both high- and low-resource languages.
  • XTransplant yields mutual improvements in cultural adaptability, especially for low-resource cultures.
  • Attention modules support multilingual understanding.
  • Feed-forward modules are more effective at capturing culture-specific knowledge.
  • Current LLMs leave substantial internalized multilingual knowledge underutilized.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same activation-transplant approach could be tested on tasks that cross other boundaries such as domain or modality.
  • Stability results from the paper suggest XTransplant might be combined with existing alignment methods to reduce English-centric bias without full retraining.
  • The exposed performance gap implies that inference-time interventions may be a cheaper route to multilingual gains than additional pre-training data collection.

Load-bearing premise

Observed performance changes result specifically from transplanting latent activations rather than from other uncontrolled factors in the experimental procedure.

What would settle it

A controlled run in which transplanting the same activations produces no performance change or produces degradation after matching all other experimental variables.

Figures

Figures reproduced from arXiv: 2412.12686 by Bing Qin, Dandan Tu, Duyu Tang, Lei Huang, Libo Qin, Qichen Hong, Weitao Ma, Xiachong Feng, Xiaocheng Feng, Xiaohui Yan, Yangfan Ye, Yichong Huang, Yunfei Lu, Zhirui Zhang.

Figure 2
Figure 2. Figure 2: Sample figure caption. Figure glish-specific and non-E [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 2
Figure 2. Figure 2: 2 0 60 80 100 60 80 100 60 80 1000 5 1 5 3 2 2 mance N ance N nce N qwen Win R Win Ra Win Rat [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Layer-wise average results across different LLMs and 20 Acc 10 Accu 0 ns / 0 1 23 45 6 789 10111213 14 1516 1718 19 20 21 222324 2526 27 28 29 30 31 32 Ac 12 3 456 78 9 1011 12 13 14 Dec 5 0 / D 5 0 ains 0 1 1 1 1 111 1 1 1 222 2 2 2 2 2 2 2 3 3 3 Source Layer A 1 11 1 1 Tar 0 ns / D 5 1 23 4 5 67 8 9 10 11 12 13 14 15 16 17 1819 20 21 22 23 2425 26 272829 30 3132 20 Acc 1 234567 8 9 10 11 12 1314 10 [PIT… view at source ↗
Figure 4
Figure 4. Figure 4: The layer-wise instance-aware upper bound results across different LLMs and [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 2
Figure 2. Figure 2: 2 0 60 80 100 60 80 100 60 80 1000 5 1 5 3 2 2 mance N ance N nce N qwen Win R Win Ra Win Rat inTieL nt (Self-Attention) (Self-Attention) Self-Attention)X .3 40 60 80 100 60 80 100 60 80 1000 .1 1 2 .7 .1 1 3 Win [PITH_FULL_IMAGE:figures/full_fig_p015_2.png] view at source ↗
Figure 8
Figure 8. Figure 8: A intermediate decoding case study of transplanting the feed forward activations from [PITH_FULL_IMAGE:figures/full_fig_p019_8.png] view at source ↗
Figure 3
Figure 3. Figure 3: Layerwise average results across different LLMs and PilotSets. 1 2 34 5 67 8 9 0 1 23 45 6 78901 2 3 4 5 67 89 0 1 2 20 A p 1 2 3 4 5 6 7 8 9 01 2 3 45 67 8 9 0 1 2 3 4 5 6 7 10 (The results of Qwen2-7B-Instruct on XCOPA in Figure 3 show anomalous fluctuationswhich are 20 Ac Filipino 10 Target Layer A Target Layer 20 ains Amharic 0 0 ains Amharic 0 0 s / D Chinese Ahi 5 0 ns / Chinese 0 Source Layer Target… view at source ↗
read the original abstract

Current large language models (LLMs) often exhibit imbalances in multilingual capabilities and cultural adaptability, largely attributed to their English-centric pre-training data. In this paper, we introduce and investigate cross-lingual latent transplantation (XTransplant), a probing framework which aims to further exploit the model's internalized multilingual knowledge during inference and examine its effects on the multilingual capability and cultural adaptability of LLMs. XTransplant framework enables models to harness the complementary strengths of both English and non-English resources by transplanting latent activations across languages. Through extensive analysis, we empirically demonstrate that XTransplant, a form of cross-lingual interaction, has mutually beneficial effects on the multilingual capability and cultural adaptability of LLMs, particularly for low-resource languages and cultures. We further reveal that attention modules play a pivotal role in supporting multilingual understanding, while feed-forward modules are more adept at capturing culture-specific knowledge. In addition, we conduct in-depth analysis of XTransplant's stability, effectiveness, and generalizability. By probing the upper bound performance of XTransplant, we expose the considerable underutilization of current LLMs' multilingual potential-a challenge that remains open. We hope our analysis offers a new lens for advancing cross-lingual interactions and better leveraging models' internalized multilingual knowledge.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The manuscript introduces XTransplant, a probing framework that transplants latent activations across languages during inference in LLMs. It claims this cross-lingual interaction yields mutually beneficial effects on multilingual capability and cultural adaptability (especially for low-resource languages), identifies attention modules as key for multilingual understanding and FFN modules for culture-specific knowledge, analyzes stability/effectiveness/generalizability, and concludes that current LLMs underutilize their multilingual potential.

Significance. If the reported gains are causally due to transplantation and replicate under controls, the work would offer an inference-time method to exploit internalized multilingual knowledge without retraining, with the module-role findings and upper-bound analysis providing concrete directions for future cross-lingual interaction research.

major comments (1)
  1. [Experimental results and analysis] The central claim that XTransplant produces mutually beneficial effects requires isolating the contribution of cross-lingual activation transplantation from other factors (e.g., changes in activation statistics or inference dynamics). The experimental sections do not describe matched controls such as same-language transplantation, random activation swaps, or frozen-module baselines that would rule out these alternatives.
minor comments (1)
  1. [Introduction / Method] Notation for the transplanted activations and the precise definition of 'mutually beneficial' (e.g., symmetric improvement thresholds) should be formalized earlier to aid reproducibility.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for highlighting the importance of rigorous controls to support the central claims regarding XTransplant's effects. We address the major comment point-by-point below.

read point-by-point responses
  1. Referee: [Experimental results and analysis] The central claim that XTransplant produces mutually beneficial effects requires isolating the contribution of cross-lingual activation transplantation from other factors (e.g., changes in activation statistics or inference dynamics). The experimental sections do not describe matched controls such as same-language transplantation, random activation swaps, or frozen-module baselines that would rule out these alternatives.

    Authors: We agree that additional matched controls are needed to more convincingly isolate the contribution of cross-lingual transplantation. In the revised manuscript we will add (1) same-language transplantation baselines, (2) random activation swap controls, and (3) frozen-module ablations where feasible. These will be reported alongside the existing stability, effectiveness, and generalizability analyses to strengthen the causal interpretation of the mutual benefits. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical claims rest on experimental measurements, not self-referential definitions or fitted inputs

full rationale

The paper introduces XTransplant as an empirical probing framework and reports observed performance changes on multilingual and cultural metrics. No equations, derivations, parameter-fitting steps, or self-citation chains appear in the abstract or described structure. Central claims are presented as outcomes of transplantation experiments rather than quantities defined in terms of the inputs themselves. No self-definitional, fitted-prediction, or ansatz-smuggling patterns are present. The work is self-contained against external benchmarks as an observational study.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review is abstract-only, so the ledger is limited to what is stated in the abstract. The central claim rests on the domain assumption that LLMs have already internalized usable multilingual and cultural knowledge in their latent activations.

axioms (1)
  • domain assumption LLMs internalize multilingual knowledge and cultural adaptability during pre-training that can be accessed and transferred via latent activations
    Explicitly stated in the abstract as the motivation and mechanism for XTransplant.

pith-pipeline@v0.9.0 · 5798 in / 1215 out tokens · 66209 ms · 2026-05-23T07:02:16.079774+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. COMPASS: COntinual Multilingual PEFT with Adaptive Semantic Sampling

    cs.LG 2026-04 unverdicted novelty 6.0

    COMPASS uses semantic clustering on multilingual embeddings to select auxiliary data for PEFT adapters, outperforming linguistic-similarity baselines on multilingual benchmarks while supporting continual adaptation.

Reference graph

Works this paper leans on

61 extracted references · 61 canonical work pages · cited by 1 Pith paper · 9 internal anchors

  1. [2]

    Artetxe, S

    M. Artetxe, S. Ruder, and D. Yogatama. On the cross-lingual transferability of monolingual representations. In D. Jurafsky, J. Chai, N. Schluter, and J. Tetreault, editors,Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4623–4637, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/...

  2. [3]

    URL https://aclanthology.org/2020.acl-main.421

  3. [4]

    Biderman, H

    S. Biderman, H. Schoelkopf, Q. G. Anthony, H. Bradley, K. O’Brien, E. Hallahan, M. A. Khan, S. Purohit, U. S. Prashanth, E. Raff, et al. Pythia: A suite for analyzing large language models across training and scaling. In International Conference on Machine Learning, pages 2397–2430. PMLR, 2023

  4. [5]

    Brown, B

    T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, et al. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020

  5. [6]

    Cahyawijaya, H

    S. Cahyawijaya, H. Lovenia, T. Yu, W. Chung, and P. Fung. InstructAlign: High-and-low resource language alignment via continual crosslingual instruction tuning. In D. Wijaya, A. F. Aji, C. Vania, G. I. Winata, and A. Purwarianti, editors, Proceedings of the First Workshop in South East Asian Language Processing , pages 55–78, Nusa Dua, Bali, Indonesia, Nov

  6. [7]

    doi: 10.18653/v1/2023.sealp-1.5

    Association for Computational Linguistics. doi: 10.18653/v1/2023.sealp-1.5. URL https://aclanthology.org/2023.sealp-1.5

  7. [8]

    P. Chen, S. Ji, N. Bogoychev, A. Kutuzov, B. Haddow, and K. Heafield. Monolingual or multilingual instruction tuning: Which makes a better alpaca. In Y . Graham and M. Purver, editors, Findings of the Association for Computational Linguistics: EACL 2024, pages 1347– 1356, St. Julian’s, Malta, Mar. 2024. Association for Computational Linguistics. URL https...

  8. [9]

    Z. Chen, F. Jiang, J. Chen, T. Wang, F. Yu, G. Chen, H. Zhang, J. Liang, C. Zhang, Z. Zhang, et al. Phoenix: Democratizing chatgpt across languages. arXiv preprint arXiv:2304.10453, 2023

  9. [10]

    Conneau and G

    A. Conneau and G. Lample. Cross-lingual language model pretraining. Advances in neural information processing systems, 32, 2019

  10. [11]

    Conneau, R

    A. Conneau, R. Rinott, G. Lample, A. Williams, S. R. Bowman, H. Schwenk, and V . Stoyanov. Xnli: Evaluating cross-lingual sentence representations. In Proceedings of the 2018 Confer- ence on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2018

  11. [12]

    Conneau, K

    A. Conneau, K. Khandelwal, N. Goyal, V . Chaudhary, G. Wenzek, F. Guzmán, E. Grave, M. Ott, L. Zettlemoyer, and V . Stoyanov. Unsupervised cross-lingual representation learning at scale. In D. Jurafsky, J. Chai, N. Schluter, and J. Tetreault, editors,Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics , pages 8440–8451,...

  12. [13]

    Unsupervised Cross-lingual Representation Learning at Scale , booktitle =

    Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.747. URL https://aclanthology.org/2020.acl-main.747

  13. [14]

    Y . Cui, Z. Yang, and X. Yao. Efficient and effective text encoding for chinese llama and alpaca. arXiv preprint arXiv:2304.08177, 2023

  14. [15]

    BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding

    J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. In J. Burstein, C. Doran, and T. Solorio, editors, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Pa- p...

  15. [16]

    Q. Dong, L. Li, D. Dai, C. Zheng, Z. Wu, B. Chang, X. Sun, J. Xu, and Z. Sui. A survey for in-context learning. ArXiv preprint, abs/2301.00234, 2023. URL https://arxiv.org/abs/ 2301.00234

  16. [17]

    Towards Measuring the Representation of Subjective Global Opinions in Language Models

    E. Durmus, K. Nguyen, T. I. Liao, N. Schiefer, A. Askell, A. Bakhtin, C. Chen, Z. Hatfield- Dodds, D. Hernandez, N. Joseph, et al. Towards measuring the representation of subjective global opinions in language models. arXiv preprint arXiv:2306.16388, 2023

  17. [18]

    C. Gao, H. Hu, P. Hu, J. Chen, J. Li, and S. Huang. Multilingual pretraining and instruc- tion tuning improve cross-lingual knowledge alignment, but only shallowly. arXiv preprint arXiv:2404.04659, 2024

  18. [19]

    Chinese-mixtral-8x7b: An open-source mixture-of-experts llm

    HIT-SCIR. Chinese-mixtral-8x7b: An open-source mixture-of-experts llm. https://github. com/HIT-SCIR/Chinese-Mixtral-8x7B , 2024

  19. [20]

    S. R. Indurthi, W. Zhou, S. Chollampatt, R. Agrawal, K. Song, L. Zhao, and C. Zhu. Improving multilingual instruction finetuning via linguistically natural and diverse datasets. In Y . Al- Onaizan, M. Bansal, and Y .-N. Chen, editors,Findings of the Association for Computational Linguistics: EMNLP 2024, pages 2306–2323, Miami, Florida, USA, Nov. 2024. Ass...

  20. [22]

    A. Q. Jiang, A. Sablayrolles, A. Mensch, C. Bamford, D. S. Chaplot, D. de las Casas, F. Bressand, G. Lengyel, G. Lample, L. Saulnier, L. R. Lavaud, M.-A. Lachaux, P. Stock, T. L. Scao, T. Lavril, T. Wang, T. Lacroix, and W. E. Sayed. Mistral 7b, 2023. URL https://arxiv.org/abs/ 2310.06825

  21. [23]

    T. Kew, F. Schottmann, and R. Sennrich. Turning English-centric LLMs into polyglots: How much multilinguality is needed? In Y . Al-Onaizan, M. Bansal, and Y .-N. Chen, editors,Findings of the Association for Computational Linguistics: EMNLP 2024, pages 13097–13124, Miami, Florida, USA, Nov. 2024. Association for Computational Linguistics. doi: 10.18653/v1...

  22. [24]

    Khurana, N

    S. Khurana, N. Dawalatabad, A. Laurent, L. Vicente, P. Gimeno, V . Mingote, and J. Glass. Cross-lingual transfer learning for low-resource speech translation. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024

  23. [25]

    Kojima, I

    T. Kojima, I. Okimura, Y . Iwasawa, H. Yanaka, and Y . Matsuo. On the multilingual ability of decoder-based pre-trained language models: Finding and controlling language-specific neurons. arXiv preprint arXiv:2404.02431, 2024

  24. [26]

    Kovaˇc, M

    G. Kovaˇc, M. Sawayama, R. Portelas, C. Colas, P. F. Dominey, and P.-Y . Oudeyer. Large language models as superpositions of cultural perspectives. arXiv preprint arXiv:2307.07870, 2023

  25. [27]

    C. Li, M. Chen, J. Wang, S. Sitaram, and X. Xie. Culturellm: Incorporating cultural differences into large language models. arXiv preprint arXiv:2402.10946, 2024

  26. [28]

    J. Li, S. Huang, X. Dai, and J. Chen. Prealign: Boosting cross-lingual transfer by early establishment of multilingual alignment. arXiv preprint arXiv:2407.16222, 2024

  27. [29]

    P. Lin, S. Ji, J. Tiedemann, A. F. Martins, and H. Schütze. Mala-500: Massive language adaptation of large language models. arXiv preprint arXiv:2401.13303, 2024

  28. [30]

    X. V . Lin, T. Mihaylov, M. Artetxe, T. Wang, S. Chen, D. Simig, M. Ott, N. Goyal, S. Bhos- ale, J. Du, et al. Few-shot learning with multilingual language models. arXiv preprint arXiv:2112.10668, 2021. 11

  29. [31]

    Lin and Y .-N

    Y .-T. Lin and Y .-N. Chen. Taiwan llm: Bridging the linguistic divide with a culturally aligned language model. arXiv preprint arXiv:2311.17487, 2023

  30. [32]

    P. Liu, W. Yuan, J. Fu, Z. Jiang, H. Hayashi, and G. Neubig. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9):1–35, 2023

  31. [33]

    Crosslingual generalization through multitask finetuning

    N. Muennighoff, T. Wang, L. Sutawika, A. Roberts, S. Biderman, T. L. Scao, M. S. Bari, S. Shen, Z.-X. Yong, H. Schoelkopf, et al. Crosslingual generalization through multitask finetuning. arXiv preprint arXiv:2211.01786, 2022

  32. [34]

    Nguyen, W

    X.-P. Nguyen, W. Zhang, X. Li, M. Aljunied, Z. Hu, C. Shen, Y . K. Chia, X. Li, J. Wang, Q. Tan, L. Cheng, G. Chen, Y . Deng, S. Yang, C. Liu, H. Zhang, and L. Bing. SeaLLMs - large language models for Southeast Asia. In Y . Cao, Y . Feng, and D. Xiong, editors,Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume...

  33. [35]

    Pires, H

    R. Pires, H. Abonizio, T. S. Almeida, and R. Nogueira. Sabiá: Portuguese large language models. In Brazilian Conference on Intelligent Systems, pages 226–240. Springer, 2023

  34. [36]

    L. Qin, Q. Chen, Y . Zhou, Z. Chen, Y . Li, L. Liao, M. Li, W. Che, and P. S. Yu. Multilin- gual large language model: A survey of resources, taxonomy and frontiers. arXiv preprint arXiv:2404.04925, 2024

  35. [37]

    Yifan Li, Yifan Du, Kun Zhou, Jinpeng Wang, Xin Zhao, and Ji-Rong Wen

    A. Ramezani and Y . Xu. Knowledge of cultural moral norms in large language models. In A. Rogers, J. Boyd-Graber, and N. Okazaki, editors, Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages 428–446, Toronto, Canada, July 2023. Association for Computational Linguistics. doi: 10.18653/v1/2...

  36. [38]

    A. Rao, A. Yerukola, V . Shah, K. Reinecke, and M. Sap. Normad: A benchmark for measuring the cultural adaptability of large language models. arXiv preprint arXiv:2404.12464, 2024

  37. [39]

    A. S. Rao, A. Khandelwal, K. Tanmay, U. Agarwal, and M. Choudhury. Ethical reason- ing over moral alignment: A case and framework for in-context ethical policies in LLMs. In H. Bouamor, J. Pino, and K. Bali, editors, Findings of the Association for Compu- tational Linguistics: EMNLP 2023 , pages 13370–13388, Singapore, Dec. 2023. Associ- ation for Computa...

  38. [40]

    Reid and M

    M. Reid and M. Artetxe. On the role of parallel data in cross-lingual transfer learning. arXiv preprint arXiv:2212.10173, 2022

  39. [41]

    T. L. Scao, A. Fan, C. Akiki, E. Pavlick, S. Ili ´c, D. Hesslow, R. Castagné, A. S. Luccioni, F. Yvon, M. Gallé, et al. Bloom: A 176b-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100, 2022

  40. [42]

    Shanahan

    M. Shanahan. Talking about large language models. ArXiv preprint, abs/2212.03551, 2022. URL https://arxiv.org/abs/2212.03551

  41. [43]

    W. Shi, R. Li, Y . Zhang, C. Ziems, R. Horesh, R. A. de Paula, D. Yang, et al. Culturebank: An online community-driven knowledge base towards culturally aware language technologies. arXiv preprint arXiv:2404.15238, 2024

  42. [44]

    Singh, F

    S. Singh, F. Vargus, D. Dsouza, B. F. Karlsson, A. Mahendiran, W.-Y . Ko, H. Shandilya, J. Patel, D. Mataciunas, L. OMahony, M. Zhang, R. Hettiarachchi, J. Wilson, M. Machado, L. S. Moura, D. Krzemi´nski, H. Fadaei, I. Ergün, I. Okoh, A. Alaagib, O. Mudannayake, Z. Alyafeai, V . M. Chien, S. Ruder, S. Guthikonda, E. A. Alghamdi, S. Gehrmann, N. Muennighof...

  43. [45]

    T. Tang, W. Luo, H. Huang, D. Zhang, X. Wang, X. Zhao, F. Wei, and J.-R. Wen. Language- specific neurons: The key to multilingual capabilities in large language models. arXiv preprint arXiv:2402.16438, 2024

  44. [46]

    LLaMA: Open and Efficient Foundation Language Models

    H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozière, N. Goyal, E. Hambro, F. Azhar, et al. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023

  45. [47]

    W. Wang, W. Jiao, J. Huang, R. Dai, J.-t. Huang, Z. Tu, and M. Lyu. Not all countries celebrate thanksgiving: On the cultural dominance in large language models. In L.-W. Ku, A. Martins, and V . Srikumar, editors, Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6349–6384, Bangkok, Thai...

  46. [48]

    Z. Wang, Z. C. Lipton, and Y . Tsvetkov. On negative interference in multilingual models: Findings and a meta-learning treatment. In B. Webber, T. Cohn, Y . He, and Y . Liu, editors, Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 4438–4450, Online, Nov. 2020. Association for Computational Linguis- tic...

  47. [49]

    J. Wei, Y . Tay, R. Bommasani, C. Raffel, B. Zoph, S. Borgeaud, D. Yogatama, M. Bosma, D. Zhou, D. Metzler, et al. Emergent abilities of large language models. ArXiv preprint, abs/2206.07682, 2022. URL https://arxiv.org/abs/2206.07682

  48. [50]

    J. Wei, X. Wang, D. Schuurmans, M. Bosma, F. Xia, E. Chi, Q. V . Le, D. Zhou, et al. Chain-of- thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837, 2022

  49. [51]

    A. Yang, B. Yang, B. Hui, B. Zheng, B. Yu, C. Zhou, C. Li, C. Li, D. Liu, F. Huang, G. Dong, H. Wei, H. Lin, J. Tang, J. Wang, J. Yang, J. Tu, J. Zhang, J. Ma, J. Yang, J. Xu, J. Zhou, J. Bai, J. He, J. Lin, K. Dang, K. Lu, K. Chen, K. Yang, M. Li, M. Xue, N. Ni, P. Zhang, P. Wang, R. Peng, R. Men, R. Gao, R. Lin, S. Wang, S. Bai, S. Tan, T. Zhu, T. Li, T...

  50. [52]

    J. Ye, X. Tao, and L. Kong. Language versatilists vs. specialists: An empirical revisiting on multilingual transfer ability. arXiv preprint arXiv:2306.06688, 2023

  51. [53]

    Y . Ye, X. Feng, X. Feng, W. Ma, L. Qin, D. Xu, Q. Yang, H. Liu, and B. Qin. GlobeSumm: A challenging benchmark towards unifying multi-lingual, cross-lingual and multi-document news summarization. In Y . Al-Onaizan, M. Bansal, and Y .-N. Chen, editors,Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 10803–10821...

  52. [54]

    OPT: Open Pre-trained Transformer Language Models

    S. Zhang, S. Roller, N. Goyal, M. Artetxe, M. Chen, S. Chen, C. Dewan, M. Diab, X. Li, X. V . Lin, et al. Opt: Open pre-trained transformer language models.arXiv preprint arXiv:2205.01068, 2022

  53. [55]

    W. X. Zhao, K. Zhou, J. Li, T. Tang, X. Wang, Y . Hou, Y . Min, B. Zhang, J. Zhang, Z. Dong, et al. A survey of large language models. ArXiv preprint, abs/2303.18223, 2023. URL https://arxiv.org/abs/2303.18223

  54. [56]

    Y . Zhao, W. Zhang, G. Chen, K. Kawaguchi, and L. Bing. How do large language models handle multilingualism? arXiv preprint arXiv:2402.18815, 2024

  55. [59]

    Win, Tie, Lose

    URLhttps://aclanthology.org/2020.acl-main.421 .279 9 llama Datasets XNLI 30.1 33.2 30.5 XQuAD 33.5 31.3 31.9 Global OpinionQA 32.1 25.8 29.0 mistral Datasets XNLI 37.7 40.3 38.0 XQuAD 39.8 35.9 39.9 Global OpinionQA 68.3 66.4 66.5 qwen Datasets XNLI 55.2 55.2 54.4 XQuAD 47.3 44.2 45.5 Global OpinionQA 64.2 62.5 62.4 V anilla Self-Attention Feed-Forward Pe...

  56. [60]

    M. A. Abbasi, A. Ghafouri, M. Firouzmandi, H. Naderi, and B. M. Bidgoli. Persianllama:273 Towards building first persian large language model. arXiv preprint arXiv:2312.15713 , 2023.274

  57. [61]

    Artetxe, S

    M. Artetxe, S. Ruder, and D. Y ogatama. On the cross-lingual transferability of monolingual275 representations. In D. Jurafsky, J. Chai, N. Schluter, and J. Tetreault, editors, Proceedings of276 the 58th Annual Meeting of the Association for Computational Linguistics , pages 4623–4637,277 Online, July 2020. Association for Computational Linguistics. doi: ...

  58. [62]

    URL https://aclanthology.org/2020.acl-main.421 .279

  59. [63]

    Biderman, H

    S. Biderman, H. Schoelkopf, Q. G. Anthony, H. Bradley, K. O’Brien, E. Hallahan, M. A.280 Khan, S. Purohit, U. S. Prashanth, E. Raff, et al. Pythia: A suite for analyzing large language281 models across training and scaling. In International Conference on Machine Learning , pages282 2397–2430. PMLR, 2023.283

  60. [64]

    Brown, B

    T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P . Dhariwal, A. Neelakantan, P . Shyam,284 G. Sastry, A. Askell, et al. Language models are few-shot learners. Advances in neural285 information processing systems , 33:1877–1901, 2020.286

  61. [65]

    Cahyawijaya, H

    S. Cahyawijaya, H. Lovenia, T. Y u, W. Chung, and P . Fung. InstructAlign: High-and-low287 resource language alignment via continual crosslingual instruction tuning. In D. Wijaya, A. F.288 10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 10 0 10 20 30 Source Layer Accuracy Gains / Declines Average source layer resu...