pith. machine review for the scientific record. sign in

arxiv: 2605.14404 · v1 · submitted 2026-05-14 · 💻 cs.CL

Recognition: 2 theorem links

· Lean Theorem

Knowledge Beyond Language: Bridging the Gap in Multilingual Machine Unlearning Evaluation

Authors on Pith no claims yet

Pith reviewed 2026-05-15 02:09 UTC · model grok-4.3

classification 💻 cs.CL
keywords multilingual machine unlearningknowledge separability scoreknowledge persistence scorecross-lingual evaluationPII leakageLLM privacyinformation removal
0
0 comments X

The pith

Two metrics called KSS and KPS measure how consistently unlearning removes information across languages in multilingual LLMs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that standard unlearning checks, which test one language at a time, miss how knowledge leaks or persists between languages in models trained on mixed-language data. It introduces the Knowledge Separability Score to rate overall unlearning quality across all languages together and the Knowledge Persistence Score to check whether forgetting holds steady between specific language pairs. These scores reveal patterns unique to multilingual unlearning that single-language tests overlook. A reader who cares about privacy in global AI services would use them to verify that sensitive details truly disappear no matter which language a user queries.

Core claim

Prior MMU evaluations extend per-language protocols without capturing cross-linguistic information distribution, so the authors define KSS to quantify overall unlearning quality across multiple languages and KPS to assess consistent removal between language pairs; applying these metrics to various unlearning methods yields insights into phenomena exclusive to the multilingual setting.

What carries the argument

Knowledge Separability Score (KSS) and Knowledge Persistence Score (KPS), which together quantify cross-language information spread and removal consistency.

If this is right

  • Unlearning methods must now be ranked by their ability to produce high KSS and stable KPS rather than by single-language accuracy alone.
  • Multilingual training corpora require new unlearning techniques that explicitly target cross-language knowledge links.
  • Evaluation protocols for commercial LLMs will need to report both KSS and KPS to demonstrate privacy compliance across markets.
  • Developers can use the two scores to detect when forgetting in one language fails to transfer to another.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Service providers could add KSS and KPS to automated monitoring dashboards to flag incomplete unlearning before deployment.
  • The metrics might generalize to other multi-modal settings where knowledge spreads across image captions or code comments in different languages.
  • Future benchmarks could combine KSS/KPS with direct fact-recall tests to create a fuller picture of forgetting quality.

Load-bearing premise

That KSS and KPS scores correctly reflect actual information leakage rates across languages without needing separate checks against human judgments or direct leakage measurements.

What would settle it

A controlled experiment that measures real leakage rates of specific facts after unlearning and finds no correlation between those rates and the proposed KSS or KPS values.

Figures

Figures reproduced from arXiv: 2605.14404 by Hyeonjin Kim, Kyomin Hwang, Nojun Kwak, Sangyeon Cho.

Figure 1
Figure 1. Figure 1: Illustration of the evaluation method in conven￾tional MMU. Existing approaches evaluate knowledge (e.g., Birth, Hobby) independently for each language. Consequently, this language-wise assessment fails to verify whether knowledge has spread across different languages has been successfully removed. Choi et al. argue that relying solely on English data leads to insufficient forgetting if the target knowledg… view at source ↗
Figure 2
Figure 2. Figure 2: Overview illustration of our setting. A knowl￾edge refers to an instance which may be expressed mul￾tilingually. Target Knowledge is the knowledge in the forget set, while Non-Target Knowledge is the one in the retain set. In this setting, we propose metrics specifi￾cally designed for the evaluation of the knowledge. Each k T i and k N j is composed of various lan￾guages but shares identical semantics. In … view at source ↗
Figure 3
Figure 3. Figure 3: Overview of multilingual parallel QA dataset generation pipeline 4 Dataset Generation 4.1 Overview Knowledge within multilingual LLMs is often dis￾tributed across diverse languages instead of being confined to a single linguistic context. To sim￾ulate such setting, we introduced a multilingual parallel dataset. Inspired by TOFU (Maini et al., 2024), we first generated 200 synthetic profiles to clearly isol… view at source ↗
Figure 4
Figure 4. Figure 4: Overview of the Knowledge Separability Score (KSS) and Knowledge Persistence Score (KPS). KSS-ROC measures the overall separability between the target and non-target knowledge, while KSS-PR evaluates how consistently the model assigns higher Si to the target knowledge compared to the non-target knowledge. KPS quantifies the extent to which knowledge inaccessible in one language but persists in another. 5 K… view at source ↗
Figure 5
Figure 5. Figure 5: Distributions of Si for both the target and non￾target knowledge after NPO and pruning in Case 2. The first row represents the probability-based Si , while the second row displays the generation-based Si . methods and the prune-based method under the p1 setting ( [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Naive prompt for synthetic profile Question and Answer Generation. [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Comparison of attribute distributions in synthetic profiles generated via naive prompting (Naive) and [PITH_FULL_IMAGE:figures/full_fig_p016_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Example of synthetic profile. l1 p1 p3 p5 Case 1 Case 2 Case 1 Case 2 Case 1 Case 2 bn 1.00 1.00 0.38 0.89 0.10 0.91 de 0.00 0.00 0.00 0.00 0.50 1.00 en 0.00 0.00 1.00 1.00 0.00 0.00 he 0.00 1.00 0.00 1.00 0.30 0.97 ru 0.00 1.00 0.00 0.00 0.50 0.93 sq 0.00 1.00 0.00 0.86 0.30 0.89 ta 0.50 1.00 0.20 0.94 0.30 0.91 zh 0.00 0.00 0.50 0.71 0.33 1.00 avg 0.19 0.63 0.26 0.68 0.29 0.83 [PITH_FULL_IMAGE:figures/f… view at source ↗
Figure 9
Figure 9. Figure 9: Prompt for generating QA dataset. You are an English QA equivalence judge. Decide whether the following two English {question / answer}s express the same meaning. Return ONLY a strict JSON object with a single boolean field named "equivalent" and nothing else—no explanations. Use exactly this format: { "equivalent": true/false } A.{question / answer}: {Original English Sentence} B.{question / answer}: {Bac… view at source ↗
Figure 10
Figure 10. Figure 10: Prompt for verifying back translated sentences. [PITH_FULL_IMAGE:figures/full_fig_p018_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Prompt for semantic equivalence rate. Example of QA Dataset Q: What is the year of birth of Danielle Johnson? A: The year of birth of Danielle Johnson is 1985. Q: How does Danielle Johnson manage her finances based on her financial habits? A: Danielle Johnson uses a simple digital budget tool to manage her finances. Q: What are the learning goals for Danielle Johnson this year? A: The learning goals for D… view at source ↗
Figure 12
Figure 12. Figure 12: Examples of generated QA dataset [PITH_FULL_IMAGE:figures/full_fig_p019_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Examples of translated multilingual QA dataset. [PITH_FULL_IMAGE:figures/full_fig_p020_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Distribution of generation-based Si scores for p1. The plots illustrate the distributions for Hold-out Language (Hold-out) and Training Language (Training) [PITH_FULL_IMAGE:figures/full_fig_p025_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Distribution of generation-based Si scores for p3. The plots illustrate the distributions for Hold-out Language (Hold-out) and Training Language (Training) [PITH_FULL_IMAGE:figures/full_fig_p027_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Distribution of generation-based Si scores for p5. The plots illustrate the distributions for Hold-out Language (Hold-out) and Training Language (Training) [PITH_FULL_IMAGE:figures/full_fig_p029_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: Distribution of probability-based Si scores for p1. The plots illustrate the distributions for Hold-out Language (Hold-out), and Training Language (Training) [PITH_FULL_IMAGE:figures/full_fig_p031_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: Distribution of probability-based Si scores for p3. The plots illustrate the distributions for Hold-out Language (Hold-out), and Training Language (Training) [PITH_FULL_IMAGE:figures/full_fig_p033_18.png] view at source ↗
Figure 19
Figure 19. Figure 19: Distribution of probability-based Si scores for p5. The plots illustrate the distributions for Hold-out Language (Hold-out), and Training Language (Training) [PITH_FULL_IMAGE:figures/full_fig_p035_19.png] view at source ↗
read the original abstract

While LLMs are increasingly used in commercial services, they pose privacy risks such as leakage of sensitive personally identifiable information (PII). For LLMs trained on multilingual corpora, Multilingual Machine Unlearning (MMU) aims to remove information across multiple languages. However, prior MMU evaluations fail to capture such cross-linguistic distribution of information, being largely limited to direct extensions of per-language evaluation protocols. To this end, we propose two metrics to evaluate the information spread across languages: the Knowledge Separability Score (KSS) and the Knowledge Persistence Score (KPS). KSS measures the overall unlearning quality across multiple languages, while KPS more specifically aims to assess consistent removal of information among different language pairs. We evaluated various unlearning methods in the multilingual setting with these metrics and conducted comprehensive analyses. Through our investigation, we provide insights into unique phenomena exclusive to MMU and offer a new perspective on MMU evaluation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper argues that prior evaluations of multilingual machine unlearning (MMU) are limited to per-language protocols and fail to capture cross-linguistic information distribution. It introduces two new metrics—the Knowledge Separability Score (KSS) for overall unlearning quality across languages and the Knowledge Persistence Score (KPS) for consistent removal across language pairs—applies them to existing unlearning methods, and claims to reveal unique MMU phenomena.

Significance. If the metrics prove valid, they could fill a genuine gap in MMU evaluation by quantifying information spread and persistence across languages rather than treating languages independently. The work supplies no quantitative results, validation data, or external anchors in the abstract, so significance cannot yet be assessed beyond the conceptual framing.

major comments (2)
  1. [Abstract] Abstract: the central claim that KSS and KPS 'capture cross-linguistic distribution of information' and 'provide insights into unique phenomena exclusive to MMU' is unsupported because no quantitative results, correlation coefficients with leakage probes, membership-inference rates, or human forgetting judgments are reported; the metrics are introduced and applied without external validation.
  2. [Abstract] The validity of KSS and KPS as measures of unlearning quality rests on the untested modeling choice that higher separability and lower persistence scores correspond to actual forgetting; no ablation or correlation study against concrete leakage metrics (e.g., PII extraction success) is described.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below, clarifying the role of the abstract versus the full paper and outlining revisions to strengthen validation.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that KSS and KPS 'capture cross-linguistic distribution of information' and 'provide insights into unique phenomena exclusive to MMU' is unsupported because no quantitative results, correlation coefficients with leakage probes, membership-inference rates, or human forgetting judgments are reported; the metrics are introduced and applied without external validation.

    Authors: The abstract is a concise summary of the work. The full manuscript reports quantitative KSS and KPS values obtained by applying the metrics to multiple unlearning methods across languages, together with analyses that identify MMU-specific patterns such as asymmetric persistence between language pairs. We agree that the abstract would benefit from explicit numerical highlights. In revision we will add selected quantitative results and any available correlations to the abstract. revision: yes

  2. Referee: [Abstract] The validity of KSS and KPS as measures of unlearning quality rests on the untested modeling choice that higher separability and lower persistence scores correspond to actual forgetting; no ablation or correlation study against concrete leakage metrics (e.g., PII extraction success) is described.

    Authors: KSS and KPS are defined to quantify cross-lingual separability and persistence on the basis of the observed model outputs after unlearning. The manuscript demonstrates their discriminative power by comparing methods. We acknowledge that direct ablations correlating the scores with external leakage measures such as PII extraction success rates were not included. We will add these correlation analyses and ablations in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity: KSS and KPS defined independently as new evaluation metrics

full rationale

The paper introduces KSS and KPS as novel metrics for assessing cross-lingual information spread and unlearning consistency in MMU. No equations, derivations, or steps in the abstract or description reduce these metrics to fitted parameters, self-citations, or their own inputs by construction. The metrics are presented as independent measures applied to existing unlearning methods, with no load-bearing self-citation chains or ansatzes smuggled in. The central claim rests on the definitions themselves rather than any forced equivalence, making the derivation self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests on the definitions of the two new scores and the assumption that they meaningfully quantify cross-lingual unlearning; no free parameters, axioms, or invented entities are specified in the abstract.

pith-pipeline@v0.9.0 · 5464 in / 1161 out tokens · 68570 ms · 2026-05-15T02:09:59.775135+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

55 extracted references · 55 canonical work pages · 10 internal anchors

  1. [1]

    Aho and Jeffrey D

    Alfred V. Aho and Jeffrey D. Ullman , title =. 1972

  2. [2]

    Publications Manual , year = "1983", publisher =

  3. [3]

    Chandra and Dexter C

    Ashok K. Chandra and Dexter C. Kozen and Larry J. Stockmeyer , year = "1981", title =. doi:10.1145/322234.322243

  4. [4]

    Scalable training of

    Andrew, Galen and Gao, Jianfeng , booktitle=. Scalable training of

  5. [5]

    Dan Gusfield , title =. 1997

  6. [6]

    Tetreault , title =

    Mohammad Sadegh Rasooli and Joel R. Tetreault , title =. Computing Research Repository , volume =. 2015 , url =

  7. [7]

    A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =

    Ando, Rie Kubota and Zhang, Tong , Issn =. A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =. Journal of Machine Learning Research , Month = dec, Numpages =

  8. [8]

    Llama 2: Open Foundation and Fine-Tuned Chat Models

    Llama 2: Open foundation and fine-tuned chat models , author=. arXiv preprint arXiv:2307.09288 , year=

  9. [9]

    Machine Unlearning: A Comprehensive Survey

    Machine unlearning: A comprehensive survey , author=. arXiv preprint arXiv:2405.07406 , year=

  10. [10]

    arXiv preprint arXiv:2210.01504 , year=

    Knowledge unlearning for mitigating privacy risks in language models , author=. arXiv preprint arXiv:2210.01504 , year=

  11. [11]

    arXiv preprint arXiv:2404.05868 , year=

    Negative preference optimization: From catastrophic collapse to effective unlearning , author=. arXiv preprint arXiv:2404.05868 , year=

  12. [12]

    arXiv preprint arXiv:2401.06121 , year=

    Tofu: A task of fictitious unlearning for llms , author=. arXiv preprint arXiv:2401.06121 , year=

  13. [13]

    arXiv preprint arXiv:2407.06460 , year=

    Muse: Machine unlearning six-way evaluation for language models , author=. arXiv preprint arXiv:2407.06460 , year=

  14. [14]

    GPT-4 Technical Report

    Gpt-4 technical report , author=. arXiv preprint arXiv:2303.08774 , year=

  15. [15]

    Gemini: A Family of Highly Capable Multimodal Models

    Gemini: a family of highly capable multimodal models , author=. arXiv preprint arXiv:2312.11805 , year=

  16. [16]

    Who’s harry potter? approximate unlearning for LLMs , author=

  17. [17]

    arXiv preprint arXiv:2406.12354 , year=

    Cross-lingual unlearning of selective knowledge in multilingual language models , author=. arXiv preprint arXiv:2406.12354 , year=

  18. [18]

    Conference on Lifelong Learning Agents , pages=

    Continual learning and private unlearning , author=. Conference on Lifelong Learning Agents , pages=. 2022 , organization=

  19. [19]

    Qwen3 Technical Report

    Qwen3 technical report , author=. arXiv preprint arXiv:2505.09388 , year=

  20. [20]

    2025 , howpublished =

    Daniele Faraglia , title =. 2025 , howpublished =

  21. [21]

    arXiv preprint arXiv:2406.13748 , year=

    Learn and unlearn in multilingual llms , author=. arXiv preprint arXiv:2406.13748 , year=

  22. [22]

    2025 , month =

    Google Cloud , title =. 2025 , month =

  23. [23]

    Joshi, R

    CultureGuard: Towards Culturally-Aware Dataset and Guard Model for Multilingual Safety Applications , author=. arXiv preprint arXiv:2508.01710 , year=

  24. [24]

    The Llama 3 Herd of Models

    The llama 3 herd of models , author=. arXiv preprint arXiv:2407.21783 , year=

  25. [25]

    , author=

    Lora: Low-rank adaptation of large language models. , author=. ICLR , volume=

  26. [26]

    The Twelfth International Conference on Learning Representations , year=

    Large Language Models Are Not Robust Multiple Choice Selectors , author=. The Twelfth International Conference on Learning Representations , year=

  27. [27]

    Cross-lingual Name Tagging and Linking for 282 Languages

    Pan, Xiaoman and Zhang, Boliang and May, Jonathan and Nothman, Joel and Knight, Kevin and Ji, Heng. Cross-lingual Name Tagging and Linking for 282 Languages. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2017. doi:10.18653/v1/P17-1178

  28. [28]

    W iki M atrix: Mining 135 M Parallel Sentences in 1620 Language Pairs from W ikipedia

    Schwenk, Holger and Chaudhary, Vishrav and Sun, Shuo and Gong, Hongyu and Guzm \'a n, Francisco. W iki M atrix: Mining 135 M Parallel Sentences in 1620 Language Pairs from W ikipedia. Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. 2021. doi:10.18653/v1/2021.eacl-main.115

  29. [29]

    ROUGE : A Package for Automatic Evaluation of Summaries

    Lin, Chin-Yew. ROUGE : A Package for Automatic Evaluation of Summaries. Text Summarization Branches Out. 2004

  30. [30]

    How multilingual is Multilingual BERT?

    How multilingual is multilingual BERT? , author=. arXiv preprint arXiv:1906.01502 , year=

  31. [31]

    Measuring Massive Multitask Language Understanding

    Measuring massive multitask language understanding , author=. arXiv preprint arXiv:2009.03300 , year=

  32. [32]

    arXiv preprint arXiv:2411.03554 , year=

    Benchmarking vision language model unlearning via fictitious facial identity dataset , author=. arXiv preprint arXiv:2411.03554 , year=

  33. [33]

    Proceedings of machine learning and systems , volume=

    Awq: Activation-aware weight quantization for on-device llm compression and acceleration , author=. Proceedings of machine learning and systems , volume=

  34. [34]

    GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers

    Gptq: Accurate post-training quantization for generative pre-trained transformers , author=. arXiv preprint arXiv:2210.17323 , year=

  35. [35]

    int8 (): 8-bit matrix multiplication for transformers at scale , author=

    Gpt3. int8 (): 8-bit matrix multiplication for transformers at scale , author=. Advances in neural information processing systems , volume=

  36. [36]

    International conference on machine learning , pages=

    Smoothquant: Accurate and efficient post-training quantization for large language models , author=. International conference on machine learning , pages=. 2023 , organization=

  37. [37]

    Advances in Neural Information Processing Systems , volume=

    Quarot: Outlier-free 4-bit inference in rotated llms , author=. Advances in Neural Information Processing Systems , volume=

  38. [38]

    arXiv preprint arXiv:2506.20251 , year=

    Q-resafe: Assessing Safety Risks and Quantization-aware Safety Patching for Quantized Large Language Models , author=. arXiv preprint arXiv:2506.20251 , year=

  39. [39]

    arXiv preprint arXiv:2511.07842 , year=

    Alignment-Aware Quantization for LLM Safety , author=. arXiv preprint arXiv:2511.07842 , year=

  40. [40]

    arXiv preprint arXiv:2410.16454 , year=

    Catastrophic failure of llm unlearning via quantization , author=. arXiv preprint arXiv:2410.16454 , year=

  41. [41]

    arXiv preprint arXiv:2506.12618 , year=

    OpenUnlearning: Accelerating LLM Unlearning via Unified Benchmarking of Methods and Metrics , author=. arXiv preprint arXiv:2506.12618 , year=

  42. [42]

    Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles , year=

    Efficient Memory Management for Large Language Model Serving with PagedAttention , author=. Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles , year=

  43. [43]

    arXiv preprint arXiv:2510.23949 , year=

    Uncovering the Potential Risks in Unlearning: Danger of English-only Unlearning in Multilingual LLMs , author=. arXiv preprint arXiv:2510.23949 , year=

  44. [44]

    arXiv preprint arXiv:2109.04660 , year=

    Dynamic collective intelligence learning: Finding efficient sparse model via refined gradients for pruned weights , author=. arXiv preprint arXiv:2109.04660 , year=

  45. [45]

    IEEE Transactions on Pattern Analysis and Machine Intelligence , year=

    A survey on deep neural network pruning: Taxonomy, comparison, analysis, and recommendations , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , year=

  46. [46]

    arXiv preprint arXiv:2403.01267 , year=

    Dissecting language models: Machine unlearning via selective pruning , author=. arXiv preprint arXiv:2403.01267 , year=

  47. [47]

    arXiv preprint arXiv:2502.15910 , year=

    Modality-aware neuron pruning for unlearning in multimodal large language models , author=. arXiv preprint arXiv:2502.15910 , year=

  48. [48]

    Proceedings of the 34th ACM International Conference on Information and Knowledge Management , pages =

    Kim, Jinwoo and Shin, Jongyun and An, Sangho and Kim, Jangho , title =. Proceedings of the 34th ACM International Conference on Information and Knowledge Management , pages =. 2025 , isbn =. doi:10.1145/3746252.3761122 , abstract =

  49. [49]

    Advances in neural information processing systems , volume=

    Direct preference optimization: Your language model is secretly a reward model , author=. Advances in neural information processing systems , volume=

  50. [50]

    Advances in neural information processing systems , volume=

    Attention is all you need , author=. Advances in neural information processing systems , volume=

  51. [51]

    Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing , pages=

    Transformer feed-forward layers are key-value memories , author=. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing , pages=

  52. [52]

    Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

    Language-specific neurons: The key to multilingual capabilities in large language models , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

  53. [53]

    No Language Left Behind: Scaling Human-Centered Machine Translation

    No language left behind: Scaling human-centered machine translation , author=. arXiv preprint arXiv:2207.04672 , year=

  54. [54]

    Protecting privacy in multimodal large language models with mllmu-bench , author=. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) , pages=

  55. [55]

    the Journal of machine Learning research , volume=

    Scikit-learn: Machine learning in Python , author=. the Journal of machine Learning research , volume=. 2011 , publisher=