Losing our Tail, Again: (Un)Natural Selection & Multilingual LLMs

· 2025 · cs.CL · arXiv 2507.03933

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Multilingual Large Language Models considerably changed how technologies influence language. While previous technologies could mediate or assist humans, there is now a tendency to offload the task of writing itself to these technologies, enabling models to change our languages more directly. While they provide us quick access to information and impressively fluent output, beneath their (apparent) sophistication lies a subtle, insidious threat: the gradual decline and loss of linguistic diversity. In this position paper, I explore how model collapse, with a particular focus on translation technology, can lead to the loss of linguistic forms, grammatical features, and cultural nuance. Model collapse refers to the consequences of self-consuming training loops, where automatically generated data (re-)enters the training data, leading to a gradual distortion of the data distribution and the underrepresentation of low-probability linguistic phenomena. Drawing on recent work in Computer Vision, Natural Language Processing and Machine Translation, I argue that the many tails of our linguistic distributions might be vanishing, and with them, the narratives and identities they carry. This paper is a call to resist linguistic flattening and to reimagine Natural Language Processing as a field that encourages, values and protects expressive multilingual diversity and creativity.

representative citing papers

Is She Even Relevant? When BERT Ignores Explicit Gender Cues

cs.CL · 2026-05-08 · conditional · novelty 7.0

A Dutch BERT model encodes gender linearly by epoch 20 but does not dynamically update its representations when explicit female cues contradict learned stereotypical associations in short sentence templates.

citing papers explorer

Showing 1 of 1 citing paper.

Is She Even Relevant? When BERT Ignores Explicit Gender Cues cs.CL · 2026-05-08 · conditional · none · ref 53 · internal anchor
A Dutch BERT model encodes gender linearly by epoch 20 but does not dynamically update its representations when explicit female cues contradict learned stereotypical associations in short sentence templates.

Losing our Tail, Again: (Un)Natural Selection & Multilingual LLMs

fields

years

verdicts

representative citing papers

citing papers explorer