arxiv: 2605.01372 · v1 · submitted 2026-05-02 · 💻 cs.CL

Recognition: unknown

Embedding-based In-Context Prompt Training for Enhancing LLMs as Text Encoders

Ailiang Lin , Zhuoyun Li , Keyu Mao , Kotaro Funakoshi , Manabu Okumura

Authors on Pith no claims yet

Pith reviewed 2026-05-09 15:00 UTC · model grok-4.3

classification 💻 cs.CL

keywords in-context learningtext embeddingscontrastive learninglarge language modelsMTEB benchmarkprompt trainingembedding models

0 comments

The pith

Replacing text demonstrations with their continuous embeddings in in-context prompts during contrastive training lets LLMs produce strong text embeddings even when no prompts are given at inference.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that large language models can be trained as text encoders by feeding demonstration embeddings instead of full text examples inside the in-context prompt. This substitution keeps the benefits of in-context learning for aligning related pairs while cutting the token length that normally inflates training and inference cost. The resulting models still deliver high-quality embeddings when the prompt demonstrations are later removed, and they reach new state-of-the-art scores on the MTEB benchmark. A sympathetic reader would care because the approach removes a major practical obstacle to using LLMs for embedding tasks at scale.

Core claim

By substituting the continuous embeddings of a few task-related demonstrations for their original text tokens inside the in-context prompt, and then performing contrastive training on this augmented input, the LLM learns both to align semantically related text pairs and to treat the supplied embeddings as meaningful context. Consequently the trained model produces competitive embeddings whether or not any demonstration embeddings are supplied at inference time.

What carries the argument

EPIC training, which inserts continuous demonstration embeddings directly into the in-context prompt during contrastive learning so the model must interpret those vectors semantically.

If this is right

EPIC-trained models achieve strong embedding quality both with and without in-context prompts at inference time.
The method sets new state-of-the-art results on the MTEB benchmark while using only publicly available retrieval data.
Token consumption drops during both training and inference because demonstration text is replaced by short embedding vectors.
Ablation experiments confirm that the embedding-based prompt mechanism is necessary for the observed gains.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same substitution trick could be tested on other in-context tasks that currently suffer from long demonstration sequences.
Success would imply that LLMs can acquire a general ability to treat continuous vectors as contextual signals rather than learning only surface token patterns.
If the pattern holds, embedding pipelines could drop the requirement for large context windows at deployment.

Load-bearing premise

Feeding continuous demonstration embeddings inside the training prompt will cause the LLM to learn to read those embeddings as semantic context and to generalize to high-quality embedding generation when no demonstrations are present at inference.

What would settle it

If EPIC-trained models show no improvement over ordinary contrastive fine-tuning when both are evaluated without any in-context demonstrations at inference time.

Figures

Figures reproduced from arXiv: 2605.01372 by Ailiang Lin, Keyu Mao, Kotaro Funakoshi, Manabu Okumura, Zhuoyun Li.

**Figure 1.** Figure 1: Comparison of different inputs for embedding view at source ↗

**Figure 2.** Figure 2: (a) Overview of the proposed EPIC method. For a given task (e.g., STS), the user input is " view at source ↗

**Figure 3.** Figure 3: Comparison between EPIC and conventional ICL on Mistral-7B. (a) Performance comparison on the view at source ↗

**Figure 4.** Figure 4: Performance comparison on the MTEB subset view at source ↗

**Figure 5.** Figure 5: Average per-sample inference latency of Mistral-7B–based methods on selected MTEB datasets. The view at source ↗

read the original abstract

Large language models (LLMs) have been widely explored for embedding generation. While recent studies show that in-context learning (ICL) effectively enhances the representational capability of LLMs by prepending a few task-related demonstrations, it causes substantial token overhead due to the increased sequence length. In this work, we propose EPIC, a novel embedding-based in-context prompt training strategy that leverages ICL to generate high-quality embeddings while reducing computational burden during both training and inference. This approach replaces discrete text demonstrations with their corresponding continuous embeddings, which not only encourages the LLM to align semantically-related text pairs during contrastive learning, but also requires the model to interpret demonstration embeddings as part of the in-context prompt. Consequently, EPIC-trained models achieve excellent embedding performance both with or without in-context prompts at inference time. Comprehensive experiments demonstrate that our method establishes new state-of-the-art results on the MTEB benchmark, surpassing frontier models trained solely on publicly available retrieval data. Extensive ablation studies further validate the effectiveness and necessity of our mechanism.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

EPIC trains LLMs for embeddings by swapping text demos for continuous vectors in ICL-style prompts during contrastive learning, yielding SOTA MTEB numbers and solid no-prompt inference, but the key ablation isolating that mechanism is missing.

read the letter

The main thing to know is that this paper replaces discrete text demonstrations with their embedding vectors inside the in-context prompt during contrastive training of LLMs as encoders. The resulting models then deliver strong embeddings both with and without prompts at inference and report new state-of-the-art numbers on the full MTEB benchmark, beating models trained only on public retrieval data. The approach directly tackles the token-length problem that comes with standard ICL for embeddings while keeping the semantic-alignment benefit of demonstrations. That is a clean, practical move and the experiments appear thorough enough to show the method works across many tasks. The ablations they include back up the overall training recipe and the claim that the models generalize to the no-prompt setting. The soft spot is exactly the one the stress test flags. The headline result depends on the model learning to read the continuous demonstration embeddings as useful context, yet the paper does not appear to include a matched contrastive baseline that drops the embedding prompts entirely during training. Without that control, it is hard to separate the effect of the proposed mechanism from the plain contrastive objective itself. If the full paper has that comparison hidden in an appendix, the concern goes away; otherwise the causal story stays under-supported. This is aimed at people working on LLM-based retrieval and embedding fine-tuning. Readers who need efficient ways to adapt frontier models for downstream search or classification tasks will find the recipe worth testing. The SOTA claim and the efficiency angle make it worth sending to peer review, though referees will almost certainly ask for the missing baseline to pin down why the no-prompt gains occur.

Referee Report

1 major / 0 minor

Summary. The paper proposes EPIC, an embedding-based in-context prompt training method for LLMs as text encoders. It replaces discrete text demonstrations with their continuous embeddings during contrastive training, claiming this trains the model to interpret embeddings semantically. Consequently, EPIC models achieve strong embedding performance both with and without in-context prompts at inference, establishing new SOTA results on the MTEB benchmark while surpassing frontier models trained only on public retrieval data.

Significance. If the central mechanism holds, the work would be significant for reducing token overhead in ICL-based embedding while preserving gains, offering a practical advance over standard contrastive fine-tuning of LLMs for retrieval and embedding tasks. The reported SOTA on MTEB and ablation studies (even if incomplete) provide a concrete benchmark for future prompt-training methods.

major comments (1)

[Ablation studies] Ablation studies (as referenced in the abstract): the headline claim that prepending continuous demonstration embeddings during contrastive training causes the LLM to learn semantic interpretation of those vectors (yielding strong no-prompt inference) is not isolated from the contrastive loss itself. No matched baseline using only standard contrastive alignment without the embedding prompts is reported, so it remains possible that the no-prompt gains arise solely from the contrastive objective rather than the proposed in-context mechanism.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback on our manuscript. We address the major comment on ablation studies below and will revise the paper accordingly to strengthen the isolation of our proposed mechanism.

read point-by-point responses

Referee: [Ablation studies] Ablation studies (as referenced in the abstract): the headline claim that prepending continuous demonstration embeddings during contrastive training causes the LLM to learn semantic interpretation of those vectors (yielding strong no-prompt inference) is not isolated from the contrastive loss itself. No matched baseline using only standard contrastive alignment without the embedding prompts is reported, so it remains possible that the no-prompt gains arise solely from the contrastive objective rather than the proposed in-context mechanism.

Authors: We agree that a matched baseline consisting of standard contrastive fine-tuning on the same data without prepending any demonstration embeddings is necessary to fully isolate the contribution of the embedding-based in-context prompt training. While the manuscript reports comparisons against several existing embedding methods and includes ablations on factors such as the number of demonstrations, we did not include this specific control condition. We will add the requested baseline in the revised version to demonstrate that the no-prompt inference gains are attributable to the proposed mechanism rather than the contrastive objective alone. revision: yes

Circularity Check

0 steps flagged

No circularity: training procedure validated on external MTEB benchmark

full rationale

The paper introduces EPIC, a contrastive training procedure that substitutes discrete demonstrations with continuous embeddings in in-context prompts. Claims of improved embedding quality (with or without prompts at inference) rest on empirical results against the external MTEB benchmark and ablation studies, not on any derivation that reduces to fitted parameters, self-definitions, or self-citation chains. No equations are presented that equate the target performance to the training inputs by construction; the method's success is measured independently of its own fitted values.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard assumptions of contrastive learning and in-context learning plus the novel substitution of embeddings for text; no new physical entities or ad-hoc constants are introduced.

axioms (2)

domain assumption Contrastive learning on semantically related pairs improves embedding quality
Invoked implicitly when the method encourages alignment of related text pairs during training
domain assumption LLMs can interpret continuous embedding vectors as meaningful in-context information
Core premise of the EPIC mechanism stated in the abstract

pith-pipeline@v0.9.0 · 5492 in / 1393 out tokens · 18993 ms · 2026-05-09T15:00:11.211901+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

68 extracted references · 21 canonical work pages · 12 internal anchors

[1]

Advances in Neural Information Processing Systems , volume=

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks , author=. Advances in Neural Information Processing Systems , volume=
[2]

Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics , pages=

MTEB: Massive text embedding benchmark , author=. Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics , pages=
[3]

Bert: Pre-training of deep bidirectional transformers for language understanding , author=. Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers) , pages=

2019
[4]

Journal of Machine Learning Research , volume=

Exploring the limits of transfer learning with a unified text-to-text transformer , author=. Journal of Machine Learning Research , volume=
[5]

, author=

Dense Passage Retrieval for Open-Domain Question Answering. , author=. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) , pages=

2020
[6]

Zihan Liu and Wei Ping and Rajarshi Roy and Peng Xu and Chankyu Lee and Mohammad Shoeybi and Bryan Catanzaro , booktitle =. Chat
[7]

Parishad BehnamGhader and Vaibhav Adlakha and Marius Mosbach and Dzmitry Bahdanau and Nicolas Chapados and Siva Reddy , booktitle=
[8]

Chankyu Lee and Rajarshi Roy and Mengyao Xu and Jonathan Raiman and Mohammad Shoeybi and Bryan Catanzaro and Wei Ping , booktitle=
[9]

ICLR 2024 Workshop: How Far Are We From AGI , year=

Generative Representational Instruction Tuning , author=. ICLR 2024 Workshop: How Far Are We From AGI , year=

2024
[10]

The Thirteenth International Conference on Learning Representations , year=

Repetition Improves Language Model Embeddings , author=. The Thirteenth International Conference on Learning Representations , year=
[11]

The Thirteenth International Conference on Learning Representations , year=

Making Text Embedders Few-Shot Learners , author=. The Thirteenth International Conference on Learning Representations , year=
[12]

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Roberta: A robustly optimized bert pretraining approach , author=. arXiv preprint arXiv:1907.11692 , year=

work page internal anchor Pith review Pith/arXiv arXiv 1907
[13]

S im CSE : Simple Contrastive Learning of Sentence Embeddings

Gao, Tianyu and Yao, Xingcheng and Chen, Danqi. S im CSE : Simple Contrastive Learning of Sentence Embeddings. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 2021

2021
[14]

and Cer, Daniel and Yang, Yinfei

Ni, Jianmo and Hernández Ábrego, Gustavo and Constant, Noah and Ma, Ji and Hall, Keith B. and Cer, Daniel and Yang, Yinfei. Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models. Findings of the Association for Computational Linguistics: ACL 2022. 2022

2022
[15]

Text Embeddings by Weakly-Supervised Contrastive Pre-training

Text Embeddings by Weakly-Supervised Contrastive Pre-training , author=. arXiv preprint arXiv:2212.03533 , year=

work page internal anchor Pith review arXiv
[16]

Towards General Text Embeddings with Multi-stage Contrastive Learning

Towards general text embeddings with multi-stage contrastive learning , author=. arXiv preprint arXiv:2308.03281 , year=

work page internal anchor Pith review arXiv
[17]

Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages=

Fine-tuning llama for multi-stage text retrieval , author=. Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages=
[18]

L lama2 V ec: Unsupervised Adaptation of Large Language Models for Dense Retrieval

Liu, Zheng and Li, Chaofan and Xiao, Shitao and Shao, Yingxia and Lian, Defu. L lama2 V ec: Unsupervised Adaptation of Large Language Models for Dense Retrieval. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024

2024
[19]

Scaling Sentence Embeddings with Large Language Models

Jiang, Ting and Huang, Shaohan and Luan, Zhongzhi and Wang, Deqing and Zhuang, Fuzhen. Scaling Sentence Embeddings with Large Language Models. Findings of the Association for Computational Linguistics: EMNLP 2024. 2024

2024
[20]

Improving Text Embeddings with Large Language Models

Wang, Liang and Yang, Nan and Huang, Xiaolong and Yang, Linjun and Majumder, Rangan and Wei, Furu. Improving Text Embeddings with Large Language Models. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics. 2024

2024
[21]

Unsupervised Dense Information Retrieval with Contrastive Learning

Unsupervised dense information retrieval with contrastive learning , author=. arXiv preprint arXiv:2112.09118 , year=

work page internal anchor Pith review arXiv
[22]

Llama 2: Open Foundation and Fine-Tuned Chat Models

Llama 2: Open foundation and fine-tuned chat models , author=. arXiv preprint arXiv:2307.09288 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[23]

Mistral 7B

Mistral 7B , author=. arXiv preprint arXiv:2310.06825 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[24]

Edward J Hu and yelong shen and Phillip Wallis and Zeyuan Allen-Zhu and Yuanzhi Li and Shean Wang and Lu Wang and Weizhu Chen , booktitle=. Lo. 2022 , url=

2022
[25]

The Twelfth International Conference on Learning Representations , year=

FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning , author=. The Twelfth International Conference on Learning Representations , year=
[26]

ELI 5: Long Form Question Answering

Fan, Angela and Jernite, Yacine and Perez, Ethan and Grangier, David and Weston, Jason and Auli, Michael. ELI 5: Long Form Question Answering. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019

2019
[27]

and Salakhutdinov, Ruslan and Manning, Christopher D

Yang, Zhilin and Qi, Peng and Zhang, Saizheng and Bengio, Yoshua and Cohen, William W. and Salakhutdinov, Ruslan and Manning, Christopher D. H otpot QA : A Dataset for Diverse, Explainable Multi-hop Question Answering. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018

2018
[28]

FEVER : a Large-scale Dataset for Fact Extraction and VER ification

Thorne, James and Vlachos, Andreas and Christodoulopoulos, Christos and Mittal, Arpit. FEVER : a Large-scale Dataset for Fact Extraction and VER ification. Proceedings of the 2018 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 2018

2018
[29]

MIRACL : A Multilingual Retrieval Dataset Covering 18 Diverse Languages

Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy. MIRACL : A Multilingual Retrieval Dataset Covering 18 Diverse Languages. Transactions of the Association for Computational Linguistics. 2023

2023
[30]

Tri Nguyen and Mir Rosenberg and Xia Song and Jianfeng Gao and Saurabh Tiwary and Rangan Majumder and Li Deng , year=
[31]

Zhang, Xinyu and Ma, Xueguang and Shi, Peng and Lin, Jimmy. Mr. T y D i: A Multi-lingual Benchmark for Dense Retrieval. Proceedings of the 1st Workshop on Multilingual Representation Learning. 2021

2021
[32]

D u R eader: a C hinese Machine Reading Comprehension Dataset from Real-world Applications

He, Wei and Liu, Kai and Liu, Jing and Lyu, Yajuan and Zhao, Shiqi and Xiao, Xinyan and Liu, Yuan and Wang, Yizhong and Wu, Hua and She, Qiaoqiao and Liu, Xuan and Wu, Tian and Wang, Haifeng. D u R eader: a C hinese Machine Reading Comprehension Dataset from Real-world Applications. Proceedings of the Workshop on Machine Reading for Question Answering. 2018

2018
[33]

Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages=

T2ranking: A large-scale chinese benchmark for passage ranking , author=. Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages=
[34]

2017 , url =

DataCanary and hilfialkaff and Jiang, Lili and Risdal, Meg and Dandekar, Nikhil and tomtung , title =. 2017 , url =

2017
[35]

SQ u AD : 100,000+ Questions for Machine Comprehension of Text

Rajpurkar, Pranav and Zhang, Jian and Lopyrev, Konstantin and Liang, Percy. SQ u AD : 100,000+ Questions for Machine Comprehension of Text. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2016

2016
[36]

and Zettlemoyer, Luke

Joshi, Mandar and Choi, Eunsol and Weld, Daniel S. and Zettlemoyer, Luke. T rivia QA : A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2017

2017
[37]

Proceedings of the 47th international ACM SIGIR conference on research and development in information retrieval , pages=

C-pack: Packed resources for general chinese embeddings , author=. Proceedings of the 47th international ACM SIGIR conference on research and development in information retrieval , pages=
[38]

Negative Matters: Multi-Granularity Hard-Negative Synthesis and Anchor-Token-Aware Pooling for Enhanced Text Embeddings

Pan, Tengyu and Duan, Zhichao and Li, Zhenyu and Dong, Bowen and Liu, Ning and Li, Xiuxing and Wang, Jianyong. Negative Matters: Multi-Granularity Hard-Negative Synthesis and Anchor-Token-Aware Pooling for Enhanced Text Embeddings. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025

2025
[39]

Sentence- BERT : Sentence Embeddings using S iamese BERT -Networks

Reimers, Nils and Gurevych, Iryna. Sentence- BERT : Sentence Embeddings using S iamese BERT -Networks. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019

2019
[40]

arXiv preprint arXiv:2509.03020 , year=

Training LLMs to be Better Text Embedders through Bidirectional Reconstruction , author=. arXiv preprint arXiv:2509.03020 , year=

work page arXiv
[41]

arXiv preprint arXiv:2410.05629 , year=

Vector-ICL: In-context Learning with Continuous Vector Representations , author=. arXiv preprint arXiv:2410.05629 , year=

work page arXiv
[42]

In-Context Learning Creates Task Vectors

Hendel, Roee and Geva, Mor and Globerson, Amir. In-Context Learning Creates Task Vectors. Findings of the Association for Computational Linguistics: EMNLP 2023. 2023

2023
[43]

Advances in Neural Information Processing Systems , volume=

Language models are few-shot learners , author=. Advances in Neural Information Processing Systems , volume=
[44]

The Power of Scale for Parameter-Efficient Prompt Tuning

Lester, Brian and Al-Rfou, Rami and Constant, Noah. The Power of Scale for Parameter-Efficient Prompt Tuning. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 2021

2021
[45]

Prefix-Tuning: Optimizing Continuous Prompts for Generation

Li, Xiang Lisa and Liang, Percy. Prefix-Tuning: Optimizing Continuous Prompts for Generation. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021

2021
[46]

B e LLM : Backward Dependency Enhanced Large Language Model for Sentence Embeddings

Li, Xianming and Li, Jing. B e LLM : Backward Dependency Enhanced Large Language Model for Sentence Embeddings. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 2024

2024
[47]

arXiv preprint arXiv:2501.09240 , year=

Task Vectors in In-Context Learning: Emergence, Formation, and Benefit , author=. arXiv preprint arXiv:2501.09240 , year=

work page arXiv
[48]

Gemini Embedding: Generalizable Embeddings from Gemini

Gemini embedding: Generalizable embeddings from gemini , author=. arXiv preprint arXiv:2503.07891 , year=

work page internal anchor Pith review arXiv
[49]

Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models

Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models , author=. arXiv preprint arXiv:2506.05176 , year=

work page internal anchor Pith review arXiv
[50]

arXiv preprint arXiv:1803.02893 , year=

An efficient framework for learning sentence representations , author=. arXiv preprint arXiv:1803.02893 , year=

work page arXiv
[51]

Efficient Estimation of Word Representations in Vector Space

Efficient estimation of word representations in vector space , author=. arXiv preprint arXiv:1301.3781 , year=

work page internal anchor Pith review arXiv
[52]

Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) , pages=

Glove: Global vectors for word representation , author=. Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) , pages=

2014
[53]

CSE : Conceptual Sentence Embeddings based on Attention Model

Wang, Yashen and Huang, Heyan and Feng, Chong and Zhou, Qiang and Gu, Jiahui and Gao, Xiong. CSE : Conceptual Sentence Embeddings based on Attention Model. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2016

2016
[54]

arXiv preprint arXiv:1511.08198 , year=

Towards universal paraphrastic sentence embeddings , author=. arXiv preprint arXiv:1511.08198 , year=

work page arXiv
[55]

Yufan Zhuang and Chandan Singh and Liyuan Liu and Jingbo Shang and Jianfeng Gao , booktitle=. Vector-. 2025 , url=

2025
[56]

Junjie Zhou, Yongping Xiong, Zheng Liu, Ze Liu, Shitao Xiao, Yueze Wang, Bo Zhao, Chen Jason Zhang, and Defu Lian

Kalm-embedding-v2: Superior training techniques and data inspire A versatile embedding model , author=. arXiv preprint arXiv:2506.20923 , year=

work page arXiv
[57]

F2LLM-v2: Inclusive, performant, and efficient embeddings for a multilingual world.arXiv preprint arXiv:2603.19223, 2026

F2LLM-v2: Inclusive, Performant, and Efficient Embeddings for a Multilingual World , author=. arXiv preprint arXiv:2603.19223 , year=

work page arXiv
[58]

Social Bias Evaluation for Large Language Models Requires Prompt Variations

Hida, Rem and Kaneko, Masahiro and Okazaki, Naoaki. Social Bias Evaluation for Large Language Models Requires Prompt Variations. Findings of the Association for Computational Linguistics: EMNLP 2025. 2025

2025
[59]

H allu L ens: LLM Hallucination Benchmark

Bang, Yejin and Ji, Ziwei and Schelten, Alan and Hartshorn, Anthony and Fowler, Tara and Zhang, Cheng and Cancedda, Nicola and Fung, Pascale. H allu L ens: LLM Hallucination Benchmark. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025

2025
[60]

Impact of pretraining term frequencies on few-shot reasoning

SGPT: GPT Sentence Embeddings for Semantic Search , author=. arXiv preprint arXiv:2202.08904 , year=

work page arXiv
[61]

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing , pages=

Large Dual Encoders Are Generalizable Retrievers , author=. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing , pages=

2022
[62]

arXiv preprint arXiv:2310.08232 , year=

Language models are universal embedders , author=. arXiv preprint arXiv:2310.08232 , year=

work page arXiv
[63]

and Zettlemoyer, Luke and Yu, Tao

Su, Hongjin and Shi, Weijia and Kasai, Jungo and Wang, Yizhong and Hu, Yushi and Ostendorf, Mari and Yih, Wen-tau and Smith, Noah A. and Zettlemoyer, Luke and Yu, Tao. One Embedder, Any Task: Instruction-Finetuned Text Embeddings. Findings of the Association for Computational Linguistics: ACL 2023. 2023

2023
[64]

A o E : Angle-optimized Embeddings for Semantic Textual Similarity

Li, Xianming and Li, Jing. A o E : Angle-optimized Embeddings for Semantic Textual Similarity. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024

2024
[65]

Causal2Vec: Improving Decoder-only LLMs as Embedding Models through a Contextual Token

Causal2Vec: Improving Decoder-only LLMs as Versatile Embedding Models , author=. arXiv preprint arXiv:2507.23386 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[66]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Look both ways and no sink: Converting llms into text encoders without training , author=. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
[67]

Multilingual E5 Text Embeddings: A Technical Report

Multilingual e5 text embeddings: A technical report , author=. arXiv preprint arXiv:2402.05672 , year=

work page internal anchor Pith review arXiv
[68]

Benchmarking and Enabling Efficient Chinese Medical Retrieval via Asymmetric Encoders

CMedTEB & CARE: Benchmarking and Enabling Efficient Chinese Medical Retrieval via Asymmetric Encoders , author =. arXiv preprint arXiv:2604.10937 , year =

work page internal anchor Pith review Pith/arXiv arXiv