pith. machine review for the scientific record. sign in

arxiv: 2605.00618 · v1 · submitted 2026-05-01 · 💻 cs.CL

Recognition: unknown

Is Textual Similarity Invariant under Machine Translation? Evidence Based on the Political Manifesto Corpus

Albert Le\'sniak, Damian Brzyski, Daria Boratyn, Dariusz Stolicki, Jan Rybicki, Maciej Rapacz, Wojciech {\L}ukasik, Wojciech S{\l}omczy\'nski

Pith reviewed 2026-05-09 19:26 UTC · model grok-4.3

classification 💻 cs.CL
keywords machine translationtextual similarityembedding modelspolitical manifestoscosine similarityinvariance testmultilingual NLP
0
0 comments X

The pith

Machine translation preserves cosine similarities between paragraph embeddings in ten languages but distorts them in four, based on a stability test using the Political Manifesto Corpus.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether machine translation keeps the relationships between paragraph embeddings stable, rather than checking if individual meanings stay the same. It does this by comparing how consistent similarity scores are across different embedding models, using the natural variation between those models on original texts as a benchmark for acceptable change. If the variation introduced by translation stays within that benchmark for a given language, the similarity structure is considered invariant under translation. A sympathetic reader would care because many multilingual analyses rely on translating texts into one language for embedding comparisons, and knowing which languages allow this without distorting relationships helps decide when that shortcut works. The method gives clear verdicts per language instead of a single yes-or-no answer.

Core claim

The authors develop a per-language non-inferiority test that checks four hypotheses about translation effects on embedding similarities. Using over 2,800 manifestos in 28 languages translated to English, they measure stability of pairwise cosine similarities across embedding models and calibrate the invariance threshold with inter-model disagreement on original text. This identifies ten languages where translation demonstrably preserves semantic structure and four where it demonstrably degrades it.

What carries the argument

The non-inferiority test for invariance, which treats inter-model disagreement on untranslated text as the threshold for acceptable translation-induced change in pairwise similarities.

If this is right

  • Translated texts can be used for similarity-based tasks in the ten invariant languages without detectable loss of semantic structure.
  • The four languages with distortion require caution or original-language processing for reliable embedding comparisons.
  • The framework can be applied to other corpora and pipelines to test invariance without needing direct semantic shift measurements.
  • Downstream tasks like clustering or retrieval across languages become more trustworthy when limited to invariant languages.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the test generalizes, researchers could screen new translation services or embedding models by running the same inter-model stability check.
  • This approach might extend to measuring invariance under other transformations like summarization or paraphrasing.
  • Knowing language-specific invariance could guide choices in building multilingual datasets or models.
  • Future work might correlate these findings with linguistic features of the languages to predict invariance without full testing.

Load-bearing premise

That the amount of disagreement between different embedding models on original-language texts provides the correct standard for deciding what level of change under translation is still acceptable.

What would settle it

Running the same analysis but with human-annotated similarity judgments on a subset of paragraphs instead of model-based ones, and checking whether the language verdicts match the automated test.

read the original abstract

We investigate the extent to which cosine similarity between paragraph embeddings is invariant under machine translation, using the Manifesto Corpus of over 2,800 political party platforms in 28 languages translated to English via the EU eTranslation service. Rather than measuring translation-induced semantic shift directly we measure the stability of pairwise similarity relationships across embedding models, and use inter-model disagreement on original-language text as a calibrated invariance threshold. This yields a per-language non-inferiority test for four hypotheses about how translation interacts with embedding choice, with verdicts that distinguish languages where translation demonstrably preserves semantic structure from those where it demonstrably degrades it and from those where the available evidence does not resolve the question. The framework is corpus- and pipeline-agnostic and extends naturally to downstream tasks. Applied to our data, it identifies ten languages with translation invariance and four with detectable distortion.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper investigates whether cosine similarity between paragraph embeddings remains invariant under machine translation. Using the Manifesto Corpus (>2,800 texts in 28 languages translated to English via EU eTranslation), it measures stability of pairwise similarities across embedding models and sets an invariance threshold from inter-model disagreement on original-language texts. This supports a per-language non-inferiority test distinguishing languages with preserved semantic structure (10 languages), detectable distortion (4 languages), and inconclusive cases. The framework is described as corpus- and pipeline-agnostic with potential extension to downstream tasks.

Significance. If the threshold choice is justified, the work supplies a practical statistical framework for quantifying MT effects on embedding-based similarity relations, which is relevant for multilingual NLP and computational social science applications involving political texts. The large-scale, real-world corpus and explicit non-inferiority formulation add empirical value; the agnostic design allows reuse beyond the current setting.

major comments (1)
  1. [§4] §4 (non-inferiority test definition): The invariance threshold is set directly to inter-model disagreement on untranslated paragraphs. No external anchor (human similarity judgments, synthetic distortion controls, or downstream task stability) is provided to show that exceeding this threshold corresponds to semantically meaningful change rather than model-specific encoding differences. This assumption is load-bearing for the reported classification of 10 invariant vs. 4 distorted languages.
minor comments (2)
  1. [§3.1] §3.1: Paragraph segmentation rules from the manifesto texts are not specified in sufficient detail to support exact reproduction of the pairwise similarity matrices.
  2. [Results tables] Results tables: The per-language verdicts lack accompanying effect sizes, exact threshold values, or confidence intervals, making it difficult to assess the margin by which each language passes or fails the test.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive review and for identifying a key methodological point. We address the major comment below, providing additional justification for our calibration approach while acknowledging the value of external validation. We plan a partial revision to expand the discussion in §4.

read point-by-point responses
  1. Referee: [§4] §4 (non-inferiority test definition): The invariance threshold is set directly to inter-model disagreement on untranslated paragraphs. No external anchor (human similarity judgments, synthetic distortion controls, or downstream task stability) is provided to show that exceeding this threshold corresponds to semantically meaningful change rather than model-specific encoding differences. This assumption is load-bearing for the reported classification of 10 invariant vs. 4 distorted languages.

    Authors: We appreciate the referee drawing attention to the calibration of the invariance threshold. Our decision to set the threshold using inter-model disagreement on the original-language paragraphs is intentional: it quantifies the baseline variability in pairwise cosine similarities that arises solely from differences in embedding model architectures and training data, without any translation. Any additional deviation observed after machine translation can therefore be interpreted as exceeding the level of change already attributable to model choice. This internal calibration supports the non-inferiority test by providing a language- and corpus-specific reference point that does not require external human annotations, which would be difficult to obtain consistently across 28 languages and would compromise the framework's agnostic design. We agree that linking the threshold to downstream task performance or human judgments would offer stronger semantic grounding and will revise §4 to include an expanded justification of the current approach together with explicit suggestions for such external anchors in future work. The classification results themselves remain unchanged, as they follow directly from the per-language statistical tests against this calibrated threshold. revision: partial

Circularity Check

0 steps flagged

No significant circularity detected in derivation chain

full rationale

The paper computes an invariance threshold from inter-model disagreement on original-language paragraphs and applies a non-inferiority test to measure whether translation-induced shifts in pairwise cosine similarities exceed that threshold. This construction is independent: the threshold is fixed from untranslated data before examining translations, so the per-language verdicts (invariant vs. distorted) are not equivalent to the inputs by definition or by fitting. No self-definitional steps, fitted inputs renamed as predictions, load-bearing self-citations, uniqueness theorems, or smuggled ansatzes appear in the described framework. The method is presented as corpus-agnostic and externally extensible, confirming the logic remains self-contained against the chosen proxy.

Axiom & Free-Parameter Ledger

0 free parameters · 3 axioms · 0 invented entities

The central claim rests on standard domain assumptions about embeddings and similarity measures plus one key calibration choice; no free parameters or invented entities are introduced in the abstract.

axioms (3)
  • domain assumption Cosine similarity between paragraph embeddings captures meaningful semantic relationships between texts
    This is the core measure used to define textual similarity throughout the study.
  • domain assumption Inter-model disagreement on original-language texts supplies a valid and calibrated threshold for acceptable change under translation
    This assumption directly defines the non-inferiority tests and the resulting language verdicts.
  • domain assumption The EU eTranslation outputs are representative machine translations suitable for testing semantic-structure preservation
    The experiment treats these translations as the intervention whose effect is being measured.

pith-pipeline@v0.9.0 · 5482 in / 1435 out tokens · 36626 ms · 2026-05-09T19:26:32.819456+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

77 extracted references · 47 canonical work pages · 9 internal anchors

  1. [1]

    Agrawal, B

    A. Agrawal, B. Fazili, and P. Jyothi. Translation Errors Significantly Impact Low-Resource Languages in Cross-Lingual Learning. In Y . Gra- ham and M. Purver, editors,Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Vol- ume 2: Short Papers), pages 319–329, St. Julian’s, Malta, Mar. 2024. Associa...

  2. [2]

    Amrhein, N

    C. Amrhein, N. Moghe, and L. Guillou. ACES: Translation Accuracy Challenge Sets for Evaluating Machine Translation Metrics. In P. Koehn, L. Barrault, O. Bojar, F. Bougares, R. Chatterjee, M. R. Costa-jussà, C. Federmann, M. Fishel, A. Fraser, M. Freitag, Y . Graham, R. Grund- kiewicz, P. Guzman, B. Haddow, M. Huck, A. Jimeno Yepes, T. Kocmi, A. Martins, M...

  3. [3]

    Massively multilingual sentence embeddings for zero- shot cross-lingual transfer and beyond

    M. Artetxe and H. Schwenk. Massively Multilingual Sentence Embed- dings for Zero-Shot Cross-Lingual Transfer and Beyond.Transactions of the Association for Computational Linguistics, 7:597–610, Sept. 2019. ISSN 2307-387X. doi: 10.1162/tacl_a_00288

  4. [4]

    M. Baker. Corpus Linguistics and Translation Studies — Implications and Applications. In M. Baker, G. Francis, and E. Tognini-Bonelli, editors,Text and Technology, chapter Text and Technology, pages 233–

  5. [5]

    ISBN 978-90-272- 8587-4

    John Benjamins Publishing Company, 1993. ISBN 978-90-272- 8587-4

  6. [6]

    Beinborn and R

    L. Beinborn and R. Choenni. Semantic Drift in Multilingual Represen- tations.Computational Linguistics, 46(3):571–603, Nov. 2020. ISSN 0891-2017. doi: 10.1162/coli_a_00382

  7. [7]

    Longformer: The Long-Document Transformer

    I. Beltagy, M. E. Peters, and A. Cohan. Longformer: The Long- Document Transformer. arXiv 2004.05150 [cs.CL], Dec. 2020

  8. [8]

    Benjamini and Y

    Y . Benjamini and Y . Hochberg. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing.Journal of the 11 Language OO-OT OO-OT best MM-MX OM-OT Bulgarian.096 -.076-.137.029 Croatian .304 .249-.185 .140 Czech .556.509 .126.423 Danish.014 -.152 -.081 .066 Dutch .155 .076-.108 .014 English -.146-.190 -.637-.297 Estonian .3...

  9. [9]

    & Hochberg, Y

    ISSN 2517-6161. doi: 10.1111/j.2517-6161.1995.tb02031.x

  10. [10]

    Blevins, T

    T. Blevins, T. Limisiewicz, S. Gururangan, M. Li, H. Gonen, N. A. Smith, and L. Zettlemoyer. Breaking the Curse of Multilinguality with Cross-Lingual Expert Language Models. arXiv 2401.10440 [cs.CL], Jan. 2024

  11. [11]

    S. R. Bowman, G. Angeli, C. Potts, and C. D. Manning. A Large Annotated Corpus for Learning Natural Language Inference. arXiv 1508.05326 [cs.CL], Aug. 2015

  12. [12]

    L. Breiman. Random Forests.Machine Learning, 45(1):5–32, Oct. 2001. ISSN 1573-0565. doi: 10.1023/A:1010933404324

  13. [13]

    Brin and L

    S. Brin and L. Page. The Anatomy of a Large-Scale Hypertextual Web Search Engine. InSeventh International World-Wide Web Conference (WWW 1998), Brisbane, AU, Apr. 1998

  14. [14]

    W. J. Browne, H. Goldstein, and J. Rasbash. Multiple Membership Mul- tiple Classification (MMMC) Models.Statistical Modelling, 1(2):103– 124, July 2001. ISSN 1471-082X. doi: 10.1177/1471082X0100100202

  15. [15]

    D. Cer, M. Diab, E. Agirre, I. Lopez-Gazpio, and L. Specia. SemEval- 2017 Task 1: Semantic Textual Similarity - Multilingual and Cross- lingual Focused Evaluation. InProceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pages 1–14, 2017. doi: 10.18653/v1/S17-2001

  16. [16]

    D. Cer, Y . Yang, S.-y. Kong, N. Hua, N. Limtiaco, R. S. John, N. Constant, M. Guajardo-Cespedes, S. Yuan, C. Tar, Y .-H. Sung, B. Strope, and R. Kurzweil. Universal Sentence Encoder. arXiv 1803.11175 [cs.CL], Apr. 2018

  17. [17]

    J. Chen, S. Xiao, P. Zhang, K. Luo, D. Lian, and Z. Liu. BGE M3- Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation. arXiv 2402.03216, June 2024. hypothesisκinvariant indeterm. distort. super. Baseline OO-OT 0.5 6 19 3 1.0 13 14 1 1.5 16 11 1 2.0 16 11 1 Best model OO-OTbest 0.5 13 12 3 1.0...

  18. [18]

    The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation

    D. Chicco and G. Jurman. The Advantages of the Matthews Correlation Coefficient (MCC) over F1 Score and Accuracy in Binary Classification Evaluation.BMC Genomics, 21(1):6, Jan. 2020. ISSN 1471-2164. doi: 10.1186/s12864-019-6413-7

  19. [19]

    K. D. Chowdhury, C. España-Bonet, and J. van Genabith. Understanding Translationese in Multi-view Embedding Spaces. In D. Scott, N. Bel, and C. Zong, editors,Proceedings of the 28th International Conference on Computational Linguistics, pages 6056–6062, Barcelona, Spain (Online), Dec. 2020. International Committee on Computational Linguistics. doi: 10.186...

  20. [20]

    Proceedings of the Association for Computational Linguistics (ACL) , pages =

    A. Conneau, K. Khandelwal, N. Goyal, V . Chaudhary, G. Wenzek, F. Guzmán, E. Grave, M. Ott, L. Zettlemoyer, and V . Stoyanov. Un- supervised Cross-lingual Representation Learning at Scale. In D. Juraf- sky, J. Chai, N. Schluter, and J. Tetreault, editors,Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8440–84...

  21. [21]

    BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

    J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. arXiv 1810.04805 [cs.CL], Oct. 2018

  22. [22]

    doi: 10.18653/v1/D19-1006

    K. Ethayarajh. How Contextual are Contextualized Word Representa- tions? Comparing the Geometry of BERT, ELMo, and GPT-2 Embed- dings. In K. Inui, J. Jiang, V . Ng, and X. Wan, editors,Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Pro- cessing (EMNLP-IJ...

  23. [23]

    F. Feng, Y . Yang, D. Cer, N. Arivazhagan, and W. Wang. Language- Agnostic BERT Sentence Embedding. InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 878–891, Dublin, Ireland, May 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.acl-long.62

  24. [24]

    Fu and L

    L. Fu and L. Liu. What Are the Differences? A Comparative Study of Generative Artificial Intelligence Translation and Human Translation of Scientific Texts.Humanities and Social Sciences Communications, 11(1): 1–12, Sept. 2024. ISSN 2662-9992. doi: 10.1057/s41599-024-03726-7

  25. [25]

    W. A. Gale and K. W. Church. A Program for Aligning Sentences in Bilingual Corpora.Computational Linguistics, 19(1):75–102, 1993

  26. [26]

    Gellerstam

    M. Gellerstam. Translationese in Swedish Novels Translated from En- glish. InTranslation Studies in Scandinavia: Proceedings from the Scandinavian Symposium on Translation Theory (SSOTT) II, number 75 12 in Lund Studies in English, pages 88–95, Lund, 1986. CWK Gleerup

  27. [27]

    P. He, C. Meister, and Z. Su. Testing Machine Translation via Referential Transparency. InProceedings of the 43rd International Conference on Software Engineering, ICSE ’21, pages 410–422, Madrid, Spain, Nov

  28. [28]

    ISBN 978-1-4503-9085-9

    IEEE Press. ISBN 978-1-4503-9085-9. doi: 10.1109/ICSE43902. 2021.00047

  29. [29]

    Hubert, P

    L. Hubert and P. Arabie. Comparing Partitions.Journal of Classification, 2(1):193–218, Dec. 1985. ISSN 1432-1343. doi: 10.1007/BF01908075

  30. [30]

    A. Q. Jiang, A. Sablayrolles, A. Mensch, C. Bamford, D. S. Chap- lot, D. de las Casas, F. Bressand, G. Lengyel, G. Lample, L. Saulnier, L. R. Lavaud, M.-A. Lachaux, P. Stock, T. L. Scao, T. Lavril, T. Wang, T. Lacroix, and W. E. Sayed. Mistral 7B. arXiv 2310.06825 [cs.CL], Oct. 2023

  31. [31]

    Jiang, S

    T. Jiang, S. Huang, Z. Luan, D. Wang, and F. Zhuang. Scaling Sentence Embeddings with Large Language Models. In Y . Al-Onaizan, M. Bansal, and Y .-N. Chen, editors,Findings of the Association for Computational Linguistics: EMNLP 2024, pages 3182–3196, Miami, Florida, USA, Nov. 2024. Association for Computational Linguistics. doi: 10.18653/ v1/2024.finding...

  32. [32]

    Khobragade, H

    R. Khobragade, H. Patel, A. Namdev, A. Mishra, and P. Bhattacharyya. Machine Translation Evaluation using Bi-directional Entailment. arXiv 1911.00681, Nov. 2019

  33. [33]

    Killick, P

    R. Killick, P. Fearnhead, and I. A. Eckley. Optimal Detection of Change- points With a Linear Computational Cost.Journal of the American Statistical Association, 107(500):1590–1598, Dec. 2012. ISSN 0162-

  34. [34]

    doi: 10.1080/01621459.2012.737745

  35. [35]

    Kurokawa, C

    D. Kurokawa, C. Goutte, and P. Isabelle. Automatic Detection of Trans- lated Text and its Impact on Machine Translation. InProceedings of Machine Translation Summit XII: Papers, Ottawa, Canada, Aug. 2009

  36. [36]

    D. Lakens. Equivalence Tests: A Practical Primer for t Tests, Cor- relations, and Meta-Analyses.Social Psychological and Personality Science, 8(4):355–362, May 2017. ISSN 1948-5506. doi: 10.1177/ 1948550617697177

  37. [37]

    C. Lee, R. Roy, M. Xu, J. Raiman, M. Shoeybi, B. Catanzaro, and W. Ping. NV-Embed: Improved Techniques for Training LLMs as Gen- eralist Embedding Models. arXiv 2405.17428, May 2024

  38. [38]

    Lehmann, S

    P. Lehmann, S. Franzmann, T. Burst, J. Lewandowski, T. Matthieß, S. Regel, F. Riethmüller, and L. Zehnter. Manifesto Corpus, 2023

  39. [39]

    Lehmann, S

    P. Lehmann, S. Franzmann, T. Burst, T. Matthieß, S. Regel, F. Rieth- müller, A. V olkens, B. Weßels, and L. Zehnter. Manifesto Project Dataset, 2023

  40. [40]

    Li and J

    X. Li and J. Li. AoE: Angle-optimized Embeddings for Semantic Tex- tual Similarity. In L.-W. Ku, A. Martins, and V . Srikumar, editors, Proceedings of the 62nd Annual Meeting of the Association for Computa- tional Linguistics (Volume 1: Long Papers), pages 1825–1839, Bangkok, Thailand, Aug. 2024. Association for Computational Linguistics. doi: 10.18653/v1...

  41. [41]

    UMAP: Uniform manifold approximation and projection for dimension reduction.Journal of Open Source Software, 3(29):861, 2018

    L. McInnes, J. Healy, N. Saul, and L. Großberger. UMAP: Uniform Man- ifold Approximation and Projection.Journal of Open Source Software, 3(29):861, Sept. 2018. ISSN 2475-9066. doi: 10.21105/joss.00861

  42. [42]

    UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

    L. McInnes, J. Healy, and J. Melville. UMAP: Uniform Manifold Ap- proximation and Projection for Dimension Reduction. arXiv 1802.03426 [stat.ML], Sept. 2020

  43. [43]

    N. Merz, S. Regel, and J. Lewandowski. The Manifesto Corpus: A New Resource for Research on Political Parties and Quantitative Text Analysis.Research & Politics, 3(2):2053168016643346, Apr. 2016. ISSN 2053-1680. doi: 10.1177/2053168016643346

  44. [44]

    Mikolov, I

    T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean. Distributed Representations of Words and Phrases and their Compositionality, Oct. 2013

  45. [45]

    J. Moon, H. Cho, and E. L. Park. Revisiting Round-trip Translation for Quality Estimation. In A. Martins, H. Moniz, S. Fumega, B. Martins, F. Batista, L. Coheur, C. Parra, I. Trancoso, M. Turchi, A. Bisazza, J. Moorkens, A. Guerberof, M. Nurminen, L. Marg, and M. L. Forcada, editors,Proceedings of the 22nd Annual Conference of the European Association for...

  46. [46]

    R. C. Moore. Fast and Accurate Sentence Alignment of Bilingual Cor- pora. In S. D. Richardson, editor,Proceedings of the 5th Conference of the Association for Machine Translation in the Americas: Technical Papers, pages 135–144, Tiburon, USA, Oct. 2002. Springer

  47. [47]

    arXiv preprint arXiv:2210.07316 , year=

    N. Muennighoff, N. Tazi, L. Magne, and N. Reimers. MTEB: Massive Text Embedding Benchmark. arXiv 2210.07316 [cs.CL], Mar. 2023

  48. [48]

    Narayan, B

    A. Narayan, B. Berger, and H. Cho. Assessing Single-Cell Transcrip- tomic Variability Through Density-Preserving Data Visualization.Na- ture Biotechnology, 39(6):765–774, June 2021. ISSN 1546-1696. doi: 10.1038/s41587-020-00801-7

  49. [49]

    S. B. Needleman and C. D. Wunsch. A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins. Journal of Molecular Biology, 48(3):443–453, Mar. 1970. ISSN 0022-

  50. [50]

    doi: 10.1016/0022-2836(70)90057-4

  51. [51]

    Niu and Y

    J. Niu and Y . Jiang. Does Simplification Hold True for Machine Transla- tions? A Corpus-Based Analysis of Lexical Diversity in Text Varieties Across Genres.Humanities and Social Sciences Communications, 11(1): 1–10, Apr. 2024. ISSN 2662-9992. doi: 10.1057/s41599-024-02986-7

  52. [52]

    Nussbaum, J

    Z. Nussbaum, J. X. Morris, B. Duderstadt, and A. Mulyar. Nomic Embed: Training a Reproducible Long Context Text Embedder. arXiv 2402.01613, Feb. 2025

  53. [53]

    Macro f1 and macro f1.arXiv preprint arXiv:1911.03347, 2019

    J. Opitz and S. Burst. Macro F1 and Macro F1. arXiv 1911.03347, Feb. 2021

  54. [54]

    Oravecz, K

    C. Oravecz, K. Bontcheva, D. Kolovratnìk, B. Kovachev, and C. Scott. eTranslation’s Submissions to the WMT22 General Machine Translation Task. In P. Koehn, L. Barrault, O. Bojar, F. Bougares, R. Chatterjee, M. R. Costa-jussà, C. Federmann, M. Fishel, A. Fraser, M. Freitag, Y . Graham, R. Grundkiewicz, P. Guzman, B. Haddow, M. Huck, A. Jimeno Yepes, T. Koc...

  55. [55]

    P. Qi, Y . Zhang, Y . Zhang, J. Bolton, and C. D. Manning. Stanza: A Python Natural Language Processing Toolkit for Many Human Lan- guages. In A. Celikyilmaz and T.-H. Wen, editors,Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 101–108, Online, July 2020. Association for Computational ...

  56. [56]

    Radford, K

    A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever. Improving Language Understanding by Generative Pre-Training. Report, OpenAI, 2018

  57. [57]

    W. M. Rand. Objective Criteria for the Evaluation of Clustering Methods. Journal of the American Statistical Association, 66(336):846–850, Dec

  58. [58]
  59. [59]

    Reimers and I

    N. Reimers and I. Gurevych. Sentence-BERT: Sentence Embeddings Using Siamese BERT-Networks. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Nov. 2019

  60. [60]

    Reimers and I

    N. Reimers and I. Gurevych. Making Monolingual Sentence Embeddings Multilingual Using Knowledge Distillation. InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Nov. 2020

  61. [61]

    Reimers and I

    N. Reimers and I. Gurevych. Sentence Transformers: MPNET Model, Aug. 2021

  62. [62]

    Multilingual

    P. Riley, I. Caswell, M. Freitag, and D. Grangier. Translationese as a Language in “Multilingual” NMT. In D. Jurafsky, J. Chai, N. Schluter, and J. Tetreault, editors,Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7737–7746, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/ v1/2020...

  63. [63]

    J. Rybicki. Can Machine Translation of Literary Texts Fool Stylometry? InDigital Humanities 2023, 2023

  64. [64]

    Saadany and C

    H. Saadany and C. Orasan. Is it Great or Terrible? Preserving Senti- ment in Neural Machine Translation of Arabic Reviews. In I. Zitouni, M. Abdul-Mageed, H. Bouamor, F. Bougares, M. El-Haj, N. Tomeh, and W. Zaghouani, editors,Proceedings of the Fifth Arabic Natural Lan- guage Processing Workshop, pages 24–37, Barcelona, Spain (Online), Dec. 2020. Associa...

  65. [65]

    D. J. Schuirmann. A Comparison of the Two One-Sided Tests Procedure and the Power Approach for Assessing the Equivalence of Average Bioavailability.Journal of Pharmacokinetics and Biopharmaceutics, 15 (6):657–680, Dec. 1987. ISSN 0090-466X. doi: 10.1007/BF01068419

  66. [66]

    A. Toral. Post-Editese: An Exacerbated Translationese. In M. Forcada, A. Way, B. Haddow, and R. Sennrich, editors,Proceedings of Machine Translation Summit XVII: Research Track, pages 273–281, Dublin, Ire- land, Aug. 2019. European Association for Machine Translation

  67. [67]

    LLaMA: Open and Efficient Foundation Language Models

    H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozière, N. Goyal, E. Hambro, F. Azhar, A. Rodriguez, A. Joulin, E. Grave, and G. Lample. LLaMA: Open and Efficient Foun- dation Language Models. arXiv 2302.13971 [cs.CL], Feb. 2023

  68. [68]

    Llama 2: Open Foundation and Fine-Tuned Chat Models

    H. Touvron, L. Martin, K. Stone, P. Albert, A. Almahairi, Y . Babaei, N. Bashlykov, S. Batra, P. Bhargava, S. Bhosale, D. Bikel, L. Blecher, C. C. Ferrer, M. Chen, G. Cucurull, D. Esiobu, J. Fernandes, J. Fu, W. Fu, B. Fuller, C. Gao, V . Goswami, N. Goyal, A. Hartshorn, S. Hosseini, R. Hou, H. Inan, M. Kardas, V . Kerkez, M. Khabsa, I. Kloumann, A. Ko- r...

  69. [69]

    Truong, L

    C. Truong, L. Oudre, and N. Vayatis. Ruptures: Change Point Detection in Python. arXiv 1801.00826, Jan. 2018

  70. [70]

    2020 , issn =

    C. Truong, L. Oudre, and N. Vayatis. Selective Review of Offline Change Point Detection Methods.Signal Processing, 167:107299, Feb. 2020. ISSN 0165-1684. doi: 10.1016/j.sigpro.2019.107299

  71. [71]

    Vasilyev, F

    O. Vasilyev, F. Isono, and J. Bohannon. Linear Cross-Lingual Mapping of Sentence Embeddings. arXiv 2305.14256, June 2024

  72. [72]

    Vaswani, N

    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin. Attention Is All You Need, Dec. 2017

  73. [73]

    V olansky, N

    V . V olansky, N. Ordan, and S. Wintner. On the Features of Translationese. Digital Scholarship in the Humanities, 30(1):98–118, Apr. 2015. ISSN 2055-7671. doi: 10.1093/llc/fqt031

  74. [74]

    L. Wang, N. Yang, X. Huang, B. Jiao, L. Yang, D. Jiang, R. Majumder, and F. Wei. Text Embeddings by Weakly-Supervised Contrastive Pre- training. arXiv 2212.03533, 2022

  75. [75]

    L. Wang, N. Yang, X. Huang, L. Yang, R. Majumder, and F. Wei. Improv- ing Text Embeddings with Large Language Models. arXiv 2401.00368, May 2024

  76. [76]

    doi: 10.18653/v1/N18-1101

    A. Williams, N. Nangia, and S. Bowman. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference. InProceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1112–1122, New Orleans, Louisiana, June 2018. Association for ...

  77. [77]

    L. Xue, N. Constant, A. Roberts, M. Kale, R. Al-Rfou, A. Siddhant, A. Barua, and C. Raffel. mT5: A Massively Multilingual Pre-Trained Text-to-Text Transformer. arXiv 2010.11934, Mar. 2021. 14 Appendix 1: Models used for generating sentence embeddings Table 10: Sentence-embedding models used per language, plus the post-translation English and multilingual ...