Recognition: unknown
Embedding-based In-Context Prompt Training for Enhancing LLMs as Text Encoders
Pith reviewed 2026-05-09 15:00 UTC · model grok-4.3
The pith
Replacing text demonstrations with their continuous embeddings in in-context prompts during contrastive training lets LLMs produce strong text embeddings even when no prompts are given at inference.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By substituting the continuous embeddings of a few task-related demonstrations for their original text tokens inside the in-context prompt, and then performing contrastive training on this augmented input, the LLM learns both to align semantically related text pairs and to treat the supplied embeddings as meaningful context. Consequently the trained model produces competitive embeddings whether or not any demonstration embeddings are supplied at inference time.
What carries the argument
EPIC training, which inserts continuous demonstration embeddings directly into the in-context prompt during contrastive learning so the model must interpret those vectors semantically.
If this is right
- EPIC-trained models achieve strong embedding quality both with and without in-context prompts at inference time.
- The method sets new state-of-the-art results on the MTEB benchmark while using only publicly available retrieval data.
- Token consumption drops during both training and inference because demonstration text is replaced by short embedding vectors.
- Ablation experiments confirm that the embedding-based prompt mechanism is necessary for the observed gains.
Where Pith is reading between the lines
- The same substitution trick could be tested on other in-context tasks that currently suffer from long demonstration sequences.
- Success would imply that LLMs can acquire a general ability to treat continuous vectors as contextual signals rather than learning only surface token patterns.
- If the pattern holds, embedding pipelines could drop the requirement for large context windows at deployment.
Load-bearing premise
Feeding continuous demonstration embeddings inside the training prompt will cause the LLM to learn to read those embeddings as semantic context and to generalize to high-quality embedding generation when no demonstrations are present at inference.
What would settle it
If EPIC-trained models show no improvement over ordinary contrastive fine-tuning when both are evaluated without any in-context demonstrations at inference time.
Figures
read the original abstract
Large language models (LLMs) have been widely explored for embedding generation. While recent studies show that in-context learning (ICL) effectively enhances the representational capability of LLMs by prepending a few task-related demonstrations, it causes substantial token overhead due to the increased sequence length. In this work, we propose EPIC, a novel embedding-based in-context prompt training strategy that leverages ICL to generate high-quality embeddings while reducing computational burden during both training and inference. This approach replaces discrete text demonstrations with their corresponding continuous embeddings, which not only encourages the LLM to align semantically-related text pairs during contrastive learning, but also requires the model to interpret demonstration embeddings as part of the in-context prompt. Consequently, EPIC-trained models achieve excellent embedding performance both with or without in-context prompts at inference time. Comprehensive experiments demonstrate that our method establishes new state-of-the-art results on the MTEB benchmark, surpassing frontier models trained solely on publicly available retrieval data. Extensive ablation studies further validate the effectiveness and necessity of our mechanism.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes EPIC, an embedding-based in-context prompt training method for LLMs as text encoders. It replaces discrete text demonstrations with their continuous embeddings during contrastive training, claiming this trains the model to interpret embeddings semantically. Consequently, EPIC models achieve strong embedding performance both with and without in-context prompts at inference, establishing new SOTA results on the MTEB benchmark while surpassing frontier models trained only on public retrieval data.
Significance. If the central mechanism holds, the work would be significant for reducing token overhead in ICL-based embedding while preserving gains, offering a practical advance over standard contrastive fine-tuning of LLMs for retrieval and embedding tasks. The reported SOTA on MTEB and ablation studies (even if incomplete) provide a concrete benchmark for future prompt-training methods.
major comments (1)
- [Ablation studies] Ablation studies (as referenced in the abstract): the headline claim that prepending continuous demonstration embeddings during contrastive training causes the LLM to learn semantic interpretation of those vectors (yielding strong no-prompt inference) is not isolated from the contrastive loss itself. No matched baseline using only standard contrastive alignment without the embedding prompts is reported, so it remains possible that the no-prompt gains arise solely from the contrastive objective rather than the proposed in-context mechanism.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback on our manuscript. We address the major comment on ablation studies below and will revise the paper accordingly to strengthen the isolation of our proposed mechanism.
read point-by-point responses
-
Referee: [Ablation studies] Ablation studies (as referenced in the abstract): the headline claim that prepending continuous demonstration embeddings during contrastive training causes the LLM to learn semantic interpretation of those vectors (yielding strong no-prompt inference) is not isolated from the contrastive loss itself. No matched baseline using only standard contrastive alignment without the embedding prompts is reported, so it remains possible that the no-prompt gains arise solely from the contrastive objective rather than the proposed in-context mechanism.
Authors: We agree that a matched baseline consisting of standard contrastive fine-tuning on the same data without prepending any demonstration embeddings is necessary to fully isolate the contribution of the embedding-based in-context prompt training. While the manuscript reports comparisons against several existing embedding methods and includes ablations on factors such as the number of demonstrations, we did not include this specific control condition. We will add the requested baseline in the revised version to demonstrate that the no-prompt inference gains are attributable to the proposed mechanism rather than the contrastive objective alone. revision: yes
Circularity Check
No circularity: training procedure validated on external MTEB benchmark
full rationale
The paper introduces EPIC, a contrastive training procedure that substitutes discrete demonstrations with continuous embeddings in in-context prompts. Claims of improved embedding quality (with or without prompts at inference) rest on empirical results against the external MTEB benchmark and ablation studies, not on any derivation that reduces to fitted parameters, self-definitions, or self-citation chains. No equations are presented that equate the target performance to the training inputs by construction; the method's success is measured independently of its own fitted values.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Contrastive learning on semantically related pairs improves embedding quality
- domain assumption LLMs can interpret continuous embedding vectors as meaningful in-context information
Reference graph
Works this paper leans on
-
[1]
Advances in Neural Information Processing Systems , volume=
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks , author=. Advances in Neural Information Processing Systems , volume=
-
[2]
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics , pages=
MTEB: Massive text embedding benchmark , author=. Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics , pages=
-
[3]
Bert: Pre-training of deep bidirectional transformers for language understanding , author=. Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers) , pages=
2019
-
[4]
Journal of Machine Learning Research , volume=
Exploring the limits of transfer learning with a unified text-to-text transformer , author=. Journal of Machine Learning Research , volume=
-
[5]
, author=
Dense Passage Retrieval for Open-Domain Question Answering. , author=. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) , pages=
2020
-
[6]
Zihan Liu and Wei Ping and Rajarshi Roy and Peng Xu and Chankyu Lee and Mohammad Shoeybi and Bryan Catanzaro , booktitle =. Chat
-
[7]
Parishad BehnamGhader and Vaibhav Adlakha and Marius Mosbach and Dzmitry Bahdanau and Nicolas Chapados and Siva Reddy , booktitle=
-
[8]
Chankyu Lee and Rajarshi Roy and Mengyao Xu and Jonathan Raiman and Mohammad Shoeybi and Bryan Catanzaro and Wei Ping , booktitle=
-
[9]
ICLR 2024 Workshop: How Far Are We From AGI , year=
Generative Representational Instruction Tuning , author=. ICLR 2024 Workshop: How Far Are We From AGI , year=
2024
-
[10]
The Thirteenth International Conference on Learning Representations , year=
Repetition Improves Language Model Embeddings , author=. The Thirteenth International Conference on Learning Representations , year=
-
[11]
The Thirteenth International Conference on Learning Representations , year=
Making Text Embedders Few-Shot Learners , author=. The Thirteenth International Conference on Learning Representations , year=
-
[12]
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Roberta: A robustly optimized bert pretraining approach , author=. arXiv preprint arXiv:1907.11692 , year=
work page internal anchor Pith review Pith/arXiv arXiv 1907
-
[13]
S im CSE : Simple Contrastive Learning of Sentence Embeddings
Gao, Tianyu and Yao, Xingcheng and Chen, Danqi. S im CSE : Simple Contrastive Learning of Sentence Embeddings. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 2021
2021
-
[14]
and Cer, Daniel and Yang, Yinfei
Ni, Jianmo and Hernández Ábrego, Gustavo and Constant, Noah and Ma, Ji and Hall, Keith B. and Cer, Daniel and Yang, Yinfei. Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models. Findings of the Association for Computational Linguistics: ACL 2022. 2022
2022
-
[15]
Text Embeddings by Weakly-Supervised Contrastive Pre-training
Text Embeddings by Weakly-Supervised Contrastive Pre-training , author=. arXiv preprint arXiv:2212.03533 , year=
work page internal anchor Pith review arXiv
-
[16]
Towards General Text Embeddings with Multi-stage Contrastive Learning
Towards general text embeddings with multi-stage contrastive learning , author=. arXiv preprint arXiv:2308.03281 , year=
work page internal anchor Pith review arXiv
-
[17]
Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages=
Fine-tuning llama for multi-stage text retrieval , author=. Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages=
-
[18]
L lama2 V ec: Unsupervised Adaptation of Large Language Models for Dense Retrieval
Liu, Zheng and Li, Chaofan and Xiao, Shitao and Shao, Yingxia and Lian, Defu. L lama2 V ec: Unsupervised Adaptation of Large Language Models for Dense Retrieval. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024
2024
-
[19]
Scaling Sentence Embeddings with Large Language Models
Jiang, Ting and Huang, Shaohan and Luan, Zhongzhi and Wang, Deqing and Zhuang, Fuzhen. Scaling Sentence Embeddings with Large Language Models. Findings of the Association for Computational Linguistics: EMNLP 2024. 2024
2024
-
[20]
Improving Text Embeddings with Large Language Models
Wang, Liang and Yang, Nan and Huang, Xiaolong and Yang, Linjun and Majumder, Rangan and Wei, Furu. Improving Text Embeddings with Large Language Models. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics. 2024
2024
-
[21]
Unsupervised Dense Information Retrieval with Contrastive Learning
Unsupervised dense information retrieval with contrastive learning , author=. arXiv preprint arXiv:2112.09118 , year=
work page internal anchor Pith review arXiv
-
[22]
Llama 2: Open Foundation and Fine-Tuned Chat Models
Llama 2: Open foundation and fine-tuned chat models , author=. arXiv preprint arXiv:2307.09288 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[23]
Mistral 7B , author=. arXiv preprint arXiv:2310.06825 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[24]
Edward J Hu and yelong shen and Phillip Wallis and Zeyuan Allen-Zhu and Yuanzhi Li and Shean Wang and Lu Wang and Weizhu Chen , booktitle=. Lo. 2022 , url=
2022
-
[25]
The Twelfth International Conference on Learning Representations , year=
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning , author=. The Twelfth International Conference on Learning Representations , year=
-
[26]
ELI 5: Long Form Question Answering
Fan, Angela and Jernite, Yacine and Perez, Ethan and Grangier, David and Weston, Jason and Auli, Michael. ELI 5: Long Form Question Answering. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019
2019
-
[27]
and Salakhutdinov, Ruslan and Manning, Christopher D
Yang, Zhilin and Qi, Peng and Zhang, Saizheng and Bengio, Yoshua and Cohen, William W. and Salakhutdinov, Ruslan and Manning, Christopher D. H otpot QA : A Dataset for Diverse, Explainable Multi-hop Question Answering. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018
2018
-
[28]
FEVER : a Large-scale Dataset for Fact Extraction and VER ification
Thorne, James and Vlachos, Andreas and Christodoulopoulos, Christos and Mittal, Arpit. FEVER : a Large-scale Dataset for Fact Extraction and VER ification. Proceedings of the 2018 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 2018
2018
-
[29]
MIRACL : A Multilingual Retrieval Dataset Covering 18 Diverse Languages
Zhang, Xinyu and Thakur, Nandan and Ogundepo, Odunayo and Kamalloo, Ehsan and Alfonso-Hermelo, David and Li, Xiaoguang and Liu, Qun and Rezagholizadeh, Mehdi and Lin, Jimmy. MIRACL : A Multilingual Retrieval Dataset Covering 18 Diverse Languages. Transactions of the Association for Computational Linguistics. 2023
2023
-
[30]
Tri Nguyen and Mir Rosenberg and Xia Song and Jianfeng Gao and Saurabh Tiwary and Rangan Majumder and Li Deng , year=
-
[31]
Zhang, Xinyu and Ma, Xueguang and Shi, Peng and Lin, Jimmy. Mr. T y D i: A Multi-lingual Benchmark for Dense Retrieval. Proceedings of the 1st Workshop on Multilingual Representation Learning. 2021
2021
-
[32]
D u R eader: a C hinese Machine Reading Comprehension Dataset from Real-world Applications
He, Wei and Liu, Kai and Liu, Jing and Lyu, Yajuan and Zhao, Shiqi and Xiao, Xinyan and Liu, Yuan and Wang, Yizhong and Wu, Hua and She, Qiaoqiao and Liu, Xuan and Wu, Tian and Wang, Haifeng. D u R eader: a C hinese Machine Reading Comprehension Dataset from Real-world Applications. Proceedings of the Workshop on Machine Reading for Question Answering. 2018
2018
-
[33]
Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages=
T2ranking: A large-scale chinese benchmark for passage ranking , author=. Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages=
-
[34]
2017 , url =
DataCanary and hilfialkaff and Jiang, Lili and Risdal, Meg and Dandekar, Nikhil and tomtung , title =. 2017 , url =
2017
-
[35]
SQ u AD : 100,000+ Questions for Machine Comprehension of Text
Rajpurkar, Pranav and Zhang, Jian and Lopyrev, Konstantin and Liang, Percy. SQ u AD : 100,000+ Questions for Machine Comprehension of Text. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2016
2016
-
[36]
and Zettlemoyer, Luke
Joshi, Mandar and Choi, Eunsol and Weld, Daniel S. and Zettlemoyer, Luke. T rivia QA : A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2017
2017
-
[37]
Proceedings of the 47th international ACM SIGIR conference on research and development in information retrieval , pages=
C-pack: Packed resources for general chinese embeddings , author=. Proceedings of the 47th international ACM SIGIR conference on research and development in information retrieval , pages=
-
[38]
Negative Matters: Multi-Granularity Hard-Negative Synthesis and Anchor-Token-Aware Pooling for Enhanced Text Embeddings
Pan, Tengyu and Duan, Zhichao and Li, Zhenyu and Dong, Bowen and Liu, Ning and Li, Xiuxing and Wang, Jianyong. Negative Matters: Multi-Granularity Hard-Negative Synthesis and Anchor-Token-Aware Pooling for Enhanced Text Embeddings. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025
2025
-
[39]
Sentence- BERT : Sentence Embeddings using S iamese BERT -Networks
Reimers, Nils and Gurevych, Iryna. Sentence- BERT : Sentence Embeddings using S iamese BERT -Networks. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019
2019
-
[40]
arXiv preprint arXiv:2509.03020 , year=
Training LLMs to be Better Text Embedders through Bidirectional Reconstruction , author=. arXiv preprint arXiv:2509.03020 , year=
-
[41]
arXiv preprint arXiv:2410.05629 , year=
Vector-ICL: In-context Learning with Continuous Vector Representations , author=. arXiv preprint arXiv:2410.05629 , year=
-
[42]
In-Context Learning Creates Task Vectors
Hendel, Roee and Geva, Mor and Globerson, Amir. In-Context Learning Creates Task Vectors. Findings of the Association for Computational Linguistics: EMNLP 2023. 2023
2023
-
[43]
Advances in Neural Information Processing Systems , volume=
Language models are few-shot learners , author=. Advances in Neural Information Processing Systems , volume=
-
[44]
The Power of Scale for Parameter-Efficient Prompt Tuning
Lester, Brian and Al-Rfou, Rami and Constant, Noah. The Power of Scale for Parameter-Efficient Prompt Tuning. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 2021
2021
-
[45]
Prefix-Tuning: Optimizing Continuous Prompts for Generation
Li, Xiang Lisa and Liang, Percy. Prefix-Tuning: Optimizing Continuous Prompts for Generation. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021
2021
-
[46]
B e LLM : Backward Dependency Enhanced Large Language Model for Sentence Embeddings
Li, Xianming and Li, Jing. B e LLM : Backward Dependency Enhanced Large Language Model for Sentence Embeddings. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 2024
2024
-
[47]
arXiv preprint arXiv:2501.09240 , year=
Task Vectors in In-Context Learning: Emergence, Formation, and Benefit , author=. arXiv preprint arXiv:2501.09240 , year=
-
[48]
Gemini Embedding: Generalizable Embeddings from Gemini
Gemini embedding: Generalizable embeddings from gemini , author=. arXiv preprint arXiv:2503.07891 , year=
work page internal anchor Pith review arXiv
-
[49]
Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models
Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models , author=. arXiv preprint arXiv:2506.05176 , year=
work page internal anchor Pith review arXiv
-
[50]
arXiv preprint arXiv:1803.02893 , year=
An efficient framework for learning sentence representations , author=. arXiv preprint arXiv:1803.02893 , year=
-
[51]
Efficient Estimation of Word Representations in Vector Space
Efficient estimation of word representations in vector space , author=. arXiv preprint arXiv:1301.3781 , year=
work page internal anchor Pith review arXiv
-
[52]
Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) , pages=
Glove: Global vectors for word representation , author=. Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) , pages=
2014
-
[53]
CSE : Conceptual Sentence Embeddings based on Attention Model
Wang, Yashen and Huang, Heyan and Feng, Chong and Zhou, Qiang and Gu, Jiahui and Gao, Xiong. CSE : Conceptual Sentence Embeddings based on Attention Model. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2016
2016
-
[54]
arXiv preprint arXiv:1511.08198 , year=
Towards universal paraphrastic sentence embeddings , author=. arXiv preprint arXiv:1511.08198 , year=
-
[55]
Yufan Zhuang and Chandan Singh and Liyuan Liu and Jingbo Shang and Jianfeng Gao , booktitle=. Vector-. 2025 , url=
2025
-
[56]
Kalm-embedding-v2: Superior training techniques and data inspire A versatile embedding model , author=. arXiv preprint arXiv:2506.20923 , year=
-
[57]
F2LLM-v2: Inclusive, Performant, and Efficient Embeddings for a Multilingual World , author=. arXiv preprint arXiv:2603.19223 , year=
-
[58]
Social Bias Evaluation for Large Language Models Requires Prompt Variations
Hida, Rem and Kaneko, Masahiro and Okazaki, Naoaki. Social Bias Evaluation for Large Language Models Requires Prompt Variations. Findings of the Association for Computational Linguistics: EMNLP 2025. 2025
2025
-
[59]
H allu L ens: LLM Hallucination Benchmark
Bang, Yejin and Ji, Ziwei and Schelten, Alan and Hartshorn, Anthony and Fowler, Tara and Zhang, Cheng and Cancedda, Nicola and Fung, Pascale. H allu L ens: LLM Hallucination Benchmark. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025
2025
-
[60]
Impact of pretraining term frequencies on few-shot reasoning
SGPT: GPT Sentence Embeddings for Semantic Search , author=. arXiv preprint arXiv:2202.08904 , year=
-
[61]
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing , pages=
Large Dual Encoders Are Generalizable Retrievers , author=. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing , pages=
2022
-
[62]
arXiv preprint arXiv:2310.08232 , year=
Language models are universal embedders , author=. arXiv preprint arXiv:2310.08232 , year=
-
[63]
and Zettlemoyer, Luke and Yu, Tao
Su, Hongjin and Shi, Weijia and Kasai, Jungo and Wang, Yizhong and Hu, Yushi and Ostendorf, Mari and Yih, Wen-tau and Smith, Noah A. and Zettlemoyer, Luke and Yu, Tao. One Embedder, Any Task: Instruction-Finetuned Text Embeddings. Findings of the Association for Computational Linguistics: ACL 2023. 2023
2023
-
[64]
A o E : Angle-optimized Embeddings for Semantic Textual Similarity
Li, Xianming and Li, Jing. A o E : Angle-optimized Embeddings for Semantic Textual Similarity. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024
2024
-
[65]
Causal2Vec: Improving Decoder-only LLMs as Embedding Models through a Contextual Token
Causal2Vec: Improving Decoder-only LLMs as Versatile Embedding Models , author=. arXiv preprint arXiv:2507.23386 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[66]
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
Look both ways and no sink: Converting llms into text encoders without training , author=. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
-
[67]
Multilingual E5 Text Embeddings: A Technical Report
Multilingual e5 text embeddings: A technical report , author=. arXiv preprint arXiv:2402.05672 , year=
work page internal anchor Pith review arXiv
-
[68]
Benchmarking and Enabling Efficient Chinese Medical Retrieval via Asymmetric Encoders
CMedTEB & CARE: Benchmarking and Enabling Efficient Chinese Medical Retrieval via Asymmetric Encoders , author =. arXiv preprint arXiv:2604.10937 , year =
work page internal anchor Pith review Pith/arXiv arXiv
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.