CCM atrix: Mining Billions of High-Quality Parallel Sentences on the Web

Holger Schwenk, Guillaume Wenzek, Sergey Edunov, Edouard Grave, Armand Joulin, Angela Fan · 2021 · DOI 10.18653/v1/2021.acl-long.507

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

open at publisher browse 5 citing papers

citation-role summary

dataset 1

citation-polarity summary

use dataset 1

representative citing papers

SEA-Embedding: Open and Reproducible Text Embeddings for Southeast Asia

cs.CL · 2026-06-02 · unverdicted · novelty 7.0

SEA-Embedding is a fully open text embedding pipeline for Southeast Asian languages that achieves state-of-the-art performance on the SEA-BED benchmark by analyzing data composition, training objectives, and base encoder choices.

M3-Embedding: Multi-Linguality, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation

cs.CL · 2024-02-05 · unverdicted · novelty 7.0

M3-Embedding is a single model for multi-lingual, multi-functional, and multi-granular text embeddings trained via self-knowledge distillation that achieves new state-of-the-art results on multilingual, cross-lingual, and long-document retrieval benchmarks.

Text Embeddings by Weakly-Supervised Contrastive Pre-training

cs.CL · 2022-12-07 · unverdicted · novelty 5.0

E5 text embeddings trained with weakly-supervised contrastive pre-training on CCPairs outperform BM25 on BEIR zero-shot and achieve top results on MTEB, beating much larger models.

CAT-Translate: Building Compact Open-Source Models for Japanese-English Translation

cs.CL · 2026-06-19 · unverdicted · novelty 3.0

Compact 0.8B-7B models for bidirectional Japanese-English translation outperform large multilingual models on real-world domain benchmarks.

Lius: Translation Model Based Instructional Lingustic Using Continual Instruction Tuning In Kupang Malay

cs.CL · 2026-06-10 · unverdicted · novelty 3.0

Lius improves LLM translation for Kupang Malay by 4-13 points over baselines via continual instruction tuning with dictionary-derived instructions.

citing papers explorer

Showing 5 of 5 citing papers.

SEA-Embedding: Open and Reproducible Text Embeddings for Southeast Asia cs.CL · 2026-06-02 · unverdicted · none · ref 41
SEA-Embedding is a fully open text embedding pipeline for Southeast Asian languages that achieves state-of-the-art performance on the SEA-BED benchmark by analyzing data composition, training objectives, and base encoder choices.
M3-Embedding: Multi-Linguality, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation cs.CL · 2024-02-05 · unverdicted · none · ref 96
M3-Embedding is a single model for multi-lingual, multi-functional, and multi-granular text embeddings trained via self-knowledge distillation that achieves new state-of-the-art results on multilingual, cross-lingual, and long-document retrieval benchmarks.
Text Embeddings by Weakly-Supervised Contrastive Pre-training cs.CL · 2022-12-07 · unverdicted · none · ref 53
E5 text embeddings trained with weakly-supervised contrastive pre-training on CCPairs outperform BM25 on BEIR zero-shot and achieve top results on MTEB, beating much larger models.
CAT-Translate: Building Compact Open-Source Models for Japanese-English Translation cs.CL · 2026-06-19 · unverdicted · none · ref 49
Compact 0.8B-7B models for bidirectional Japanese-English translation outperform large multilingual models on real-world domain benchmarks.
Lius: Translation Model Based Instructional Lingustic Using Continual Instruction Tuning In Kupang Malay cs.CL · 2026-06-10 · unverdicted · none · ref 53
Lius improves LLM translation for Kupang Malay by 4-13 points over baselines via continual instruction tuning with dictionary-derived instructions.

CCM atrix: Mining Billions of High-Quality Parallel Sentences on the Web

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer