arxiv: 2510.07048 · v2 · submitted 2025-10-08 · 💻 cs.CL · cs.AI

Search-R3: Unifying Reasoning and Embedding in Large Language Models

Yuntao Gui , James Cheng This is my paper

Pith reviewed 2026-05-18 09:22 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords large language modelsinformation retrievalembeddingschain of thoughtreinforcement learningunified trainingknowledge-intensive tasks

0 comments

The pith

Search-R3 trains LLMs to output retrieval embeddings directly as the result of step-by-step reasoning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Search-R3 as a way to make large language models handle retrieval tasks by producing embeddings inside their own reasoning process rather than as a separate step. It combines an initial supervised stage that teaches the model to emit useful embeddings, followed by reinforcement learning that jointly improves both the reasoning chain and the resulting embeddings. A specialized training environment lets the embeddings evolve across iterations without forcing a full re-encoding of the entire corpus each time. This integrated approach is shown to outperform earlier methods that keep reasoning and embedding generation apart on a range of retrieval benchmarks. A sympathetic reader would care because many knowledge-intensive applications need both careful analysis and accurate lookup, and unifying the two inside one model could remove the need for separate retrieval components.

Core claim

Search-R3 adapts LLMs so that they generate search embeddings as a direct output of their chain-of-thought reasoning. The framework uses three mechanisms: supervised learning to establish basic embedding quality, reinforcement learning that optimizes embedding generation together with the reasoning steps, and a specialized RL environment that manages changing embeddings without requiring complete corpus re-encoding at every training iteration. Evaluations on diverse benchmarks show that this unification produces stronger retrieval results than prior methods that treat reasoning and embedding as separate processes.

What carries the argument

The Search-R3 framework, which treats embedding generation as an extension of the LLM's chain-of-thought reasoning and optimizes it jointly with reasoning via supervised learning and a specialized RL environment that avoids full corpus re-encoding.

If this is right

LLMs can be post-trained to perform retrieval without maintaining a separate embedding model.
Chain-of-thought analysis can directly shape embedding vectors for more semantically precise retrieval.
Reinforcement learning can jointly tune reasoning quality and embedding usefulness in a single loop.
Training remains practical because embeddings can be updated incrementally without re-processing the full corpus each iteration.
Complex knowledge-intensive tasks become feasible inside one model rather than through stitched-together components.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same unification pattern might let LLMs produce structured outputs other than embeddings, such as database queries or code snippets, directly from reasoning.
If the RL environment generalizes across domains, the method could reduce the current separation between reasoning models and retrieval indexes in production systems.
A natural next test would be whether the trained model maintains its advantage when the underlying corpus changes frequently after deployment.

Load-bearing premise

The specialized RL environment can efficiently optimize embedding generation without requiring complete corpus re-encoding at each training iteration while still producing embeddings that generalize to unseen queries and corpora.

What would settle it

Run Search-R3 on a held-out corpus and query set; if retrieval accuracy falls below strong non-unified baselines or if the model requires full re-encoding to maintain performance, the unification claim does not hold.

Figures

Figures reproduced from arXiv: 2510.07048 by James Cheng, Yuntao Gui.

**Figure 1.** Figure 1: Illustration of Search-R3. Despite these powerful reasoning capabilities, LLMs have been surprisingly underutilized in searching and embedding applications. Current approaches to search typically operate independently from LLMs and their reasoning processes, creating an artificial separation between how models comprehend content and how information is retrieved. This disconnection prevents the sophisticate… view at source ↗

**Figure 2.** Figure 2: Training pipeline of Search-R3. 3 Overview Our framework transforms an instruction-tuned base model (e.g., Qwen) into powerful embedding generators through a systematic two-stage training pipeline. The first stage integrates supervised fine-tuning (SFT) with contrastive learning (Section §4.1), teaching the model to recognize and respond to our specialized <|embed_token|> token while developing embedding… view at source ↗

**Figure 3.** Figure 3: System prompt in Stage 2. distance constraints, ensuring positive documents remain closer to the query than negative ones by at least the margin 𝜃, we set the margin parameter 𝜃 = 0.15 by pactice. Through this comprehensive training approach, we effectively transform the LLM’s next-token prediction capability into a mechanism for generating high-quality semantic vectors. The model learns to analyze input … view at source ↗

**Figure 5.** Figure 5: Score distributions before and after RL training. [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

read the original abstract

Despite their remarkable natural language understanding capabilities, Large Language Models (LLMs) have been underutilized for retrieval tasks. We present Search-R3, a novel framework that addresses this limitation by adapting LLMs to generate search embeddings as a direct output of their reasoning process. Our approach exploits LLMs' chain-of-thought capabilities, allowing them to produce more effective embeddings by reasoning step-by-step through complex semantic analyses. We implement this through three complementary mechanisms. (1) a supervised learning stage enables the model's ability to produce quality embeddings, (2) a reinforcement learning (RL) methodology that optimizes embedding generation alongside reasoning, and (3) a specialized RL environment that efficiently handles evolving embedding representations without requiring complete corpus re-encoding at each training iteration. Our extensive evaluations on diverse benchmarks demonstrate that Search-R3 significantly outperforms prior methods by unifying the reasoning and embedding generation processes. This integrated post-training approach represents a substantial advancement in handling complex knowledge-intensive tasks that require both sophisticated reasoning and effective information retrieval. Project page: https://github.com/ytgui/Search-R3

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Search-R3 folds embedding generation into an LLM's chain-of-thought via RL and a custom environment to skip full re-encoding, but the abstract leaves the key efficiency mechanism and the actual results too vague to judge.

read the letter

The main new piece is the specialized RL environment that updates embeddings incrementally without re-encoding the whole corpus at each step. This is paired with supervised pre-training for basic embedding quality and then RL that tunes both reasoning and embedding output together. The framing around using chain-of-thought to improve semantic analysis before embedding is a sensible extension of existing reasoning work, and the paper correctly identifies that current LLMs are underused for retrieval inside knowledge-intensive tasks.

Referee Report

2 major / 1 minor

Summary. The paper introduces Search-R3, a framework adapting LLMs to generate search embeddings directly from their reasoning process. It employs three mechanisms: (1) supervised learning to produce quality embeddings, (2) RL optimization of embedding generation alongside reasoning, and (3) a specialized RL environment that handles evolving embedding representations without requiring complete corpus re-encoding at each iteration. The central claim is that this unification yields significant outperformance over prior methods on diverse benchmarks for complex knowledge-intensive tasks.

Significance. If the empirical claims hold with proper controls, the work would advance integration of reasoning and retrieval in LLMs by leveraging chain-of-thought for embedding generation and addressing computational barriers via incremental embedding updates. The GitHub project page supporting code release is a positive step toward reproducibility.

major comments (2)

[Abstract / RL environment description] Abstract and description of mechanism (3): The specialized RL environment is claimed to efficiently handle evolving embeddings without complete corpus re-encoding, yet no approximation technique (caching, delta-indexing, or otherwise), reward formulation for embedding quality, or ablation isolating this component is provided. This is load-bearing for the unification claim, as the headline outperformance and efficiency gains rest on embeddings remaining high-quality and generalizable to unseen queries/corpora under incremental updates.
[Evaluation / Experiments] Evaluation section: The abstract asserts 'extensive evaluations' and 'significant outperformance' but supplies no specifics on benchmarks, baselines, statistical significance testing, or ablations. Without these, it is impossible to confirm that reported gains do not reduce to fitted reward parameters or self-referential loops, undermining verification of the central claim.

minor comments (1)

[Abstract] The abstract could more clearly distinguish the three mechanisms and their individual contributions to avoid conflating supervised fine-tuning effects with the RL components.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments highlight areas where additional technical details and clarity can strengthen the presentation of Search-R3. We address each major comment below and indicate the revisions we will incorporate.

read point-by-point responses

Referee: [Abstract / RL environment description] Abstract and description of mechanism (3): The specialized RL environment is claimed to efficiently handle evolving embeddings without complete corpus re-encoding, yet no approximation technique (caching, delta-indexing, or otherwise), reward formulation for embedding quality, or ablation isolating this component is provided. This is load-bearing for the unification claim, as the headline outperformance and efficiency gains rest on embeddings remaining high-quality and generalizable to unseen queries/corpora under incremental updates.

Authors: We agree that the current description of the specialized RL environment is high-level and would benefit from greater technical specificity to support the efficiency and unification claims. In the revised manuscript, we will expand Section 3.3 to detail the approximation technique, which relies on delta-indexing combined with selective caching of prior embedding states to avoid full corpus re-encoding. We will also explicitly state the reward formulation (a composite of embedding similarity to ground-truth retrieval targets and reasoning coherence) and include a dedicated ablation that isolates the incremental-update component. These additions will directly address concerns about generalizability to unseen queries and corpora. revision: yes
Referee: [Evaluation / Experiments] Evaluation section: The abstract asserts 'extensive evaluations' and 'significant outperformance' but supplies no specifics on benchmarks, baselines, statistical significance testing, or ablations. Without these, it is impossible to confirm that reported gains do not reduce to fitted reward parameters or self-referential loops, undermining verification of the central claim.

Authors: The referee correctly notes that the abstract is summary-level. The full manuscript reports results across multiple knowledge-intensive benchmarks with comparisons to prior retrieval and reasoning methods, plus initial ablations. To improve verifiability, we will revise the abstract to name the primary benchmarks and baselines, and we will augment the evaluation section with explicit statistical significance testing (e.g., paired t-tests) and further ablations that control for reward-parameter fitting and potential self-referential effects. These changes will allow readers to better assess whether gains stem from the unified reasoning-embedding process. revision: yes

Circularity Check

0 steps flagged

No circularity: unification presented as empirical framework with external validation

full rationale

The abstract outlines three mechanisms—supervised learning for embedding quality, RL optimization of embeddings alongside reasoning, and a specialized RL environment for incremental updates without full re-encoding—then claims outperformance via evaluations on diverse benchmarks. No equations, fitted parameters renamed as predictions, or self-citations are shown that would reduce the unification result to its own inputs by construction. The derivation chain is presented as a composite training approach whose success is measured externally rather than derived tautologically from the mechanisms themselves.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests on the assumption that LLMs can be post-trained to produce high-quality embeddings as part of reasoning without separate embedding models. No explicit free parameters, axioms, or invented entities are stated in the abstract, but the RL reward design and the specialized environment are implicit modeling choices whose details are not provided.

pith-pipeline@v0.9.0 · 5711 in / 1240 out tokens · 23037 ms · 2026-05-18T09:22:07.698156+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

specialized RL environment that efficiently handles evolving embedding representations without requiring complete corpus re-encoding at each training iteration
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean LogicNat.induction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

GRPO with DCGscaled reward on query-positive-negative triplets

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

87 extracted references · 87 canonical work pages · 24 internal anchors

[1]

Anirudh Ajith, Mengzhou Xia, Alexis Chevalier, Tanya Goyal, Danqi Chen, and Tianyu Gao. 2024. LitSearch: A Retrieval Benchmark for Scientific Literature Search. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, EMNLP 2024, Miami, FL, USA, November 12-16, 2024, Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen (Eds.)...

work page doi:10.18653/v1/2024.emnlp-main.840 2024
[2]

Ben Abacha Asma and Demner-Fushman Dina. 2019. A Question-Entailment Ap- proach to Question Answering.BMC Bioinform.20, 1 (2019), 511:1–511:23. https: //bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-3119-4

work page doi:10.1186/s12859-019-3119-4 2019
[3]

Parul Awasthy, Aashka Trivedi, Yulong Li, Mihaela Bornea, David Cox, Abraham Daniels, Martin Franz, Gabe Goodhart, Bhavani Iyer, Vishwajeet Kumar, et al

work page
[4]

Granite Embedding Models.arXiv preprint arXiv:2502.20204(2025)

work page arXiv 2025
[5]

Vassileios Balntas, Edgar Riba, Daniel Ponsa, and Krystian Mikolajczyk. 2016. Learning local feature descriptors with triplets and shallow convolutional neural networks. InProceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016, Richard C. Wilson, Edwin R. Hancock, and William A. P. Smith (Eds.). BMVA Press. h...

work page 2016
[6]

Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomás Mikolov. 2017. Enriching Word Vectors with Subword Information.Trans. Assoc. Comput. Lin- guistics5 (2017), 135–146. doi:10.1162/TACL_A_00051

work page doi:10.1162/tacl_a_00051 2017
[7]

Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin...

work page 2020
[8]

Jianlv Chen, Shitao Xiao, Peitian Zhang, Kun Luo, Defu Lian, and Zheng Liu

work page
[9]

M3-Embedding: Multi-Linguality, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation

BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation.CoRRabs/2402.03216 (2024). doi:10.48550/ARXIV.2402.03216 arXiv:2402.03216

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2402.03216 2024
[10]

Qi Chen, Xiubo Geng, Corby Rosset, Carolyn Buractaon, Jingwen Lu, Tao Shen, Kun Zhou, Chenyan Xiong, Yeyun Gong, Paul N. Bennett, Nick Craswell, Xing Xie, Fan Yang, Bryan Tower, Nikhil Rao, Anlei Dong, Wenqi Jiang, Zheng Liu, Mingqin Li, Chuanjie Liu, Zengzhong Li, Rangan Majumder, Jennifer Neville, Andy Oakley, Knut Magne Risvik, Harsha Vardhan Simhadri,...

work page arXiv 2024
[11]

Zhao, Yanping Huang, Andrew M

Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Yunxuan Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Al- bert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Alex Castro-Ros, Marie Pellat, Kevin Robinson, Dasha Valter, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Y. Zhao, Yanp...

work page 2024
[12]

Wikipedia contributors. 2025. Wikipedia, the free encyclopedia. https://en. wikipedia.org/wiki/Main_Page Accessed: 2025-10-01

work page 2025
[13]

Payel Das, Subhajit Chaudhury, Elliot Nelson, Igor Melnyk, Sarath Swaminathan, Sihui Dai, Aurélie Lozano, Georgios Kollias, Vijil Chenthamarakshan, Soham Dan, et al. 2024. Larimar: Large language models with episodic memory control. arXiv preprint arXiv:2403.11901(2024)

work page arXiv 2024
[14]

DeepSeek-AI, Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, Xiaokang Zhang, Xingkai Yu, Yu Wu, Z. F. Wu, Zhibin Gou, Zhihong Shao, Zhuoshu Li, Ziyi Gao, Aixin Liu, Bing Xue, Bingxuan Wang, Bochao Wu, Bei Feng, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, Damai D...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2501.12948 2025
[15]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Associa- tion for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, ...

work page doi:10.18653/v1/n19-1423 2019
[16]

Darren Edge, Ha Trinh, Newman Cheng, Joshua Bradley, Alex Chao, Apurva Mody, Steven Truitt, Dasha Metropolitansky, Robert Osazuwa Ness, and Jonathan Larson. 2024. From local to global: A graph rag approach to query-focused summarization.arXiv preprint arXiv:2404.16130(2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[17]

Tianyu Gao, Xingcheng Yao, and Danqi Chen. 2021. SimCSE: Simple Contrastive Learning of Sentence Embeddings. InProceedings of the 2021 Conference on Em- pirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021, Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih (...

work page doi:10.18653/v1/2021.emnlp-main.552 2021
[18]

Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yixin Dai, Jiawei Sun, Haofen Wang, and Haofen Wang. 2023. Retrieval-augmented generation for large language models: A survey.arXiv preprint arXiv:2312.10997 2, 1 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[19]

Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. 2024. The llama 3 herd of models.arXiv preprint arXiv:2407.21783 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[20]

Yuxian Gu, Li Dong, Furu Wei, and Minlie Huang. 2023. Minillm: Knowledge distillation of large language models.arXiv preprint arXiv:2306.08543(2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[21]

Haoyu Han, Yu Wang, Harry Shomer, Kai Guo, Jiayuan Ding, Yongjia Lei, Ma- hantesh Halappanavar, Ryan A Rossi, Subhabrata Mukherjee, Xianfeng Tang, et al

work page
[22]

Retrieval-augmented generation with graphs (graphrag).arXiv preprint arXiv:2501.00309(2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[23]

Zexue He, Leonid Karlinsky, Donghyun Kim, Julian McAuley, Dmitry Krotov, and Rogerio Feris. 2024. Camelot: Towards large language models with training-free consolidated associative memory.arXiv preprint arXiv:2402.13449(2024)

work page arXiv 2024
[24]

Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network.arXiv preprint arXiv:1503.02531(2015)

work page internal anchor Pith review Pith/arXiv arXiv 2015
[25]

Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen

Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. 2022. LoRA: Low-Rank Adaptation of Large Language Models. InThe Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net. https://openreview.net/forum?id=nZeVKeeFYf9

work page 2022
[26]

Junjie Huang, Duyu Tang, Linjun Shou, Ming Gong, Ke Xu, Daxin Jiang, Ming Zhou, and Nan Duan. 2021. CoSQA: 20, 000+ Web Queries for Code Search and Question Answering. InProceedings of the 59th Annual Meeting of the As- sociation for Computational Linguistics and the 11th International Joint Confer- ence on Natural Language Processing, ACL/IJCNLP 2021, (V...

work page doi:10.18653/v1/2021.acl-long.442 2021
[27]

HuggingFaceFW. 2025. clean-wikipedia dataset at Hugging Face. https: //huggingface.co/datasets/HuggingFaceFW/clean-wikipedia Accessed: 2025-10- 01

work page 2025
[28]

Hamel Husain, Ho-Hsiang Wu, Tiferet Gazit, Miltiadis Allamanis, and Marc Brockschmidt. 2019. Codesearchnet challenge: Evaluating the state of semantic code search.arXiv preprint arXiv:1909.09436(2019)

work page internal anchor Pith review Pith/arXiv arXiv 2019
[29]

Bowen Jin, Hansi Zeng, Zhenrui Yue, Dong Wang, Hamed Zamani, and Jiawei Han. 2025. Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning.CoRRabs/2503.09516 (2025). doi:10.48550/ARXIV. 2503.09516 arXiv:2503.09516

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv 2025
[30]

doi: 10.18653/v1/P17-1147

Mandar Joshi, Eunsol Choi, Daniel S. Weld, and Luke Zettlemoyer. 2017. TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehen- sion. InProceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, Vancouver, Canada, July 30 - August 4, Volume 1: Long Pa- pers, Regina Barzilay and Min-Yen K...

work page doi:10.18653/v1/p17-1147 2017
[31]

Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. 2020. Dense Passage Retrieval for Open- Domain Question Answering. InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16-20, 2020, Bonnie Webber, Trevor Cohn, Yulan He, and Ya...

work page doi:10.18653/v1/2020.emnlp-main.550 2020
[32]

Andreas Köpf, Yannic Kilcher, Dimitri von Rütte, Sotiris Anagnostidis, Zhi Rui Tam, Keith Stevens, Abdullah Barhoum, Duc Nguyen, Oliver Stan- ley, Richárd Nagyfi, Shahul ES, Sameer Suri, David Glushkov, Arnav Dan- tuluri, Andrew Maguire, Christoph Schuhmann, Huu Nguyen, and Alexan- der Mattick. 2023. OpenAssistant Conversations - Democratizing Large Langu...

work page 2023
[33]

Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. 2020. ALBERT: A Lite BERT for Self-supervised Learn- ing of Language Representations. In8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenRe- view.net. https://openreview.net/forum?id=H1eA7AEtvS

work page 2020
[34]

Xiang Lisa Li and Percy Liang. 2021. Prefix-Tuning: Optimizing Continuous Prompts for Generation. InProceedings of the 59th Annual Meeting of the As- sociation for Computational Linguistics and the 11th International Joint Confer- ence on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Pa- pers), Virtual Event, August 1-6, 2021, Chengqing Zo...

work page doi:10.18653/v1/2021.acl-long.353 2021
[35]

Yiwei Li, Jiayi Shi, Shaoxiong Feng, Peiwen Yuan, Xinglin Wang, Boyuan Pan, Heda Wang, Yao Hu, and Kan Li. 2024. Instruction Embedding: Latent Repre- sentations of Instructions Towards Task Identification. InAdvances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, NeurIPS 2024, Vancouver, BC, C...

work page 2024
[36]

Zhizhong Li and Derek Hoiem. 2017. Learning without forgetting.IEEE transac- tions on pattern analysis and machine intelligence40, 12 (2017), 2935–2947

work page 2017
[37]

Zehan Li, Xin Zhang, Yanzhao Zhang, Dingkun Long, Pengjun Xie, and Meis- han Zhang. 2023. Towards General Text Embeddings with Multi-stage Con- trastive Learning.CoRRabs/2308.03281 (2023). doi:10.48550/ARXIV.2308.03281 arXiv:2308.03281

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2308.03281 2023
[38]

Hunter Lightman, Vineet Kosaraju, Yuri Burda, Harrison Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, and Karl Cobbe. 2023. Let’s verify step by step. InThe Twelfth International Conference on Learning Representations

work page 2023
[39]

Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. 2023. Visual in- struction tuning.Advances in neural information processing systems36 (2023), 34892–34916

work page 2023
[40]

2022.LlamaIndex

Jerry Liu. 2022.LlamaIndex. doi:10.5281/zenodo.1234

work page doi:10.5281/zenodo.1234 2022
[41]

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach.CoRRabs/1907.11692 (2019). Conference acronym ’XX, June 03–05, 2018, Woodstock, NY Yuntao Gui and James Cheng arXiv:1907.11692 http://arxiv.org/abs/1907.11692

work page internal anchor Pith review Pith/arXiv arXiv 2019
[42]

Kyle Lo, Lucy Lu Wang, Mark Neumann, Rodney Kinney, and Daniel S. Weld

work page
[43]

InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020, Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel R

S2ORC: The Semantic Scholar Open Research Corpus. InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020, Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel R. Tetreault (Eds.). Association for Computational Linguistics, 4969–4983. doi:10.18653/V1/2020.ACL-MAIN.447

work page doi:10.18653/v1/2020.acl-main.447 2020
[44]

Le, Barret Zoph, Jason Wei, and Adam Roberts

Shayne Longpre, Le Hou, Tu Vu, Albert Webson, Hyung Won Chung, Yi Tay, Denny Zhou, Quoc V. Le, Barret Zoph, Jason Wei, and Adam Roberts. 2023. The Flan Collection: Designing Data and Methods for Effective Instruction Tuning. InInternational Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA (Proceedings of Machine Learning R...

work page 2023
[45]

Shayne Longpre, Yi Lu, and Joachim Daiber. 2020. MKQA: A Linguistically Diverse Benchmark for Multilingual Open Domain Question Answering. https: //arxiv.org/pdf/2007.15207.pdf

work page arXiv 2020
[46]

Corrado, and Jeffrey Dean

Tomás Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. InAdvances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Pro- ceedings of a meeting held December 5-8, 2013, Lake Tahoe, Nevad...

work page 2013
[47]

Shervin Minaee, Nal Kalchbrenner, Erik Cambria, Narjes Nikzad, Meysam Chenaghlu, and Jianfeng Gao. 2022. Deep Learning-based Text Classifica- tion: A Comprehensive Review.ACM Comput. Surv.54, 3 (2022), 62:1–62:40. doi:10.1145/3439726

work page doi:10.1145/3439726 2022
[48]

N Muennighoff. [n. d.]. Sgpt: Gpt sentence embeddings for semantic search. arXiv 2022.arXiv preprint arXiv:2202.08904([n. d.])

work page arXiv 2022
[49]

Niklas Muennighoff, Nouamane Tazi, Loïc Magne, and Nils Reimers. 2022. MTEB: Massive Text Embedding Benchmark.arXiv preprint arXiv:2210.07316(2022). doi:10.48550/ARXIV.2210.07316

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2210.07316 2022
[50]

Hall, Daniel Cer, and Yinfei Yang

Jianmo Ni, Gustavo Hernández Ábrego, Noah Constant, Ji Ma, Keith B. Hall, Daniel Cer, and Yinfei Yang. 2022. Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models. InFindings of the Association for Computational Linguistics: ACL 2022, Dublin, Ireland, May 22-27, 2022, Smaranda Muresan, Preslav Nakov, and Aline Villavicencio (Eds.). ...

work page doi:10.18653/v1/2022.findings-acl.146 2022
[51]

Zhao, Yi Luan, Keith B

Jianmo Ni, Chen Qu, Jing Lu, Zhuyun Dai, Gustavo Hernández Ábrego, Ji Ma, Vincent Y. Zhao, Yi Luan, Keith B. Hall, Ming-Wei Chang, and Yinfei Yang. 2022. Large Dual Encoders Are Generalizable Retrievers. InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 202...

work page doi:10.18653/v1/2022.emnlp-main.669 2022
[52]

Aaron van den Oord, Yazhe Li, and Oriol Vinyals. 2018. Representation learning with contrastive predictive coding.arXiv preprint arXiv:1807.03748(2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[53]

Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul F. Christiano, Jan Leike, and Ryan Lowe. 2022. Train- ing language models to follow instructions with h...

work page 2022
[54]

Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. Glove: Global Vectors for Word Representation. InProceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL, Alessandro Moschitti, Bo Pang, and Walter Dael...

work page doi:10.3115/v1/d14-1162 2014
[55]

Xipeng Qiu, Tianxiang Sun, Yige Xu, Yunfan Shao, Ning Dai, and Xuanjing Huang. 2020. Pre-trained Models for Natural Language Processing: A Survey. CoRRabs/2003.08271 (2020). arXiv:2003.08271 https://arxiv.org/abs/2003.08271

work page arXiv 2020
[56]

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning Transferable Visual Models From Natural Language Supervision. InProceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July ...

work page 2021
[57]

Manning, Ste- fano Ermon, and Chelsea Finn

Rafael Rafailov, Archit Sharma, Eric Mitchell, Christopher D. Manning, Ste- fano Ermon, and Chelsea Finn. 2023. Direct Preference Optimization: Your Language Model is Secretly a Reward Model. InAdvances in Neural Infor- mation Processing Systems 36: Annual Conference on Neural Information Pro- cessing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, Dece...

work page 2023
[58]

Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Confer- ence on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019, Kentaro Inui, Jing Jiang, Vince...

work page doi:10.18653/v1/d19-1410 2019
[59]

Stephen Robertson, Hugo Zaragoza, et al . 2009. The probabilistic relevance framework: BM25 and beyond.Foundations and Trends®in Information Retrieval 3, 4 (2009), 333–389

work page 2009
[60]

Anna Rogers, Olga Kovaleva, and Anna Rumshisky. 2020. A Primer in BERTology: What We Know About How BERT Works.Trans. Assoc. Comput. Linguistics8 (2020), 842–866. doi:10.1162/TACL_A_00349

work page doi:10.1162/tacl_a_00349 2020
[61]

Gerard Salton and Christopher Buckley. 1988. Term-weighting approaches in automatic text retrieval.Information processing & management24, 5 (1988), 513–523

work page 1988
[62]

Victor Sanh, Albert Webson, Colin Raffel, Stephen H Bach, Lintang Sutawika, Zaid Alyafeai, Antoine Chaffin, Arnaud Stiegler, Teven Le Scao, Arun Raja, et al

work page
[63]

Multitask prompted training enables zero-shot task generalization.arXiv preprint arXiv:2110.08207(2021)

work page internal anchor Pith review Pith/arXiv arXiv 2021
[64]

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov

work page
[65]

Proximal Policy Optimization Algorithms

Proximal Policy Optimization Algorithms.CoRRabs/1707.06347 (2017). arXiv:1707.06347 http://arxiv.org/abs/1707.06347

work page internal anchor Pith review Pith/arXiv arXiv 2017
[66]

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Mingchuan Zhang, Y. K. Li, Y. Wu, and Daya Guo. 2024. DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models.CoRRabs/2402.03300 (2024). doi:10.48550/ARXIV.2402.03300 arXiv:2402.03300

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2402.03300 2024
[67]

Karen Spärck Jones. 2004. A statistical interpretation of term specificity and its application in retrieval.Journal of documentation60, 5 (2004), 493–502

work page 2004
[68]

Smith, Luke Zettlemoyer, and Tao Yu

Hongjin Su, Weijia Shi, Jungo Kasai, Yizhong Wang, Yushi Hu, Mari Ostendorf, Wen-tau Yih, Noah A. Smith, Luke Zettlemoyer, and Tao Yu. 2023. One Embedder, Any Task: Instruction-Finetuned Text Embeddings. InFindings of the Associa- tion for Computational Linguistics: ACL 2023, Toronto, Canada, July 9-14, 2023, Anna Rogers, Jordan L. Boyd-Graber, and Naoaki...

work page doi:10.18653/v1/2023.findings-acl.71 2023
[69]

Weiwei Sun, Lingyong Yan, Xinyu Ma, Shuaiqiang Wang, Pengjie Ren, Zhumin Chen, Dawei Yin, and Zhaochun Ren. 2023. Is ChatGPT good at search? investigat- ing large language models as re-ranking agents.arXiv preprint arXiv:2304.09542 (2023)

work page arXiv 2023
[70]

Rohan Taori, Ishaan Gulrajani, Tianyi Zhang, Yann Dubois, Xuechen Li, Carlos Guestrin, Percy Liang, and Tatsunori B Hashimoto. 2023. Alpaca: A strong, replicable instruction-following model.Stanford Center for Research on Foundation Models. https://crfm. stanford. edu/2023/03/13/alpaca. html3, 6 (2023), 7

work page 2023
[71]

Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurélien Rodriguez, Armand Joulin, Edouard Grave, and Guillaume Lam- ple. 2023. LLaMA: Open and Efficient Foundation Language Models.CoRR abs/2302.13971 (2023). doi:10.48550/ARXIV.2302.13971 arXiv:...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2302.13971 2023
[72]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need.Advances in neural information processing systems30 (2017)

work page 2017
[73]

Henrique Schechter Vera, Sahil Dua, Biao Zhang, Daniel Salz, Ryan Mullins, Sindhu Raghuram Panyam, Sara Smoot, Iftekhar Naim, Joe Zou, Feiyang Chen, et al. 2025. EmbeddingGemma: Powerful and Lightweight Text Representations. arXiv preprint arXiv:2509.20354(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[74]

David Wadden, Shanchuan Lin, Kyle Lo, Lucy Lu Wang, Madeleine van Zuylen, Arman Cohan, and Hannaneh Hajishirzi. 2020. Fact or Fiction: Verifying Scientific Claims. InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Online, 7534–7550. doi:10.18653/v1/2020.emnlp-main.609

work page doi:10.18653/v1/2020.emnlp-main.609 2020
[75]

Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Majumder, and Furu Wei. 2022. Text Embeddings by Weakly-Supervised Contrastive Pre-training.CoRRabs/2212.03533 (2022). doi:10.48550/ARXIV.2212. 03533 arXiv:2212.03533

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2212 2022
[76]

Peiyi Wang, Lei Li, Zhihong Shao, Runxin Xu, Damai Dai, Yifei Li, Deli Chen, Yu Wu, and Zhifang Sui. 2024. Math-Shepherd: Verify and Reinforce LLMs Step- by-step without Human Annotations. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2024, Bangkok, Thailand, August 11-16, 2024, Lun-...

work page doi:10.18653/v1/2024.acl-long.510 2024
[77]

Weizhi Wang, Li Dong, Hao Cheng, Xiaodong Liu, Xifeng Yan, Jianfeng Gao, and Furu Wei. 2023. Augmenting language models with long-term memory.Advances Search-R3: Unifying Reasoning and Embedding Generation in Large Language Models Conference acronym ’XX, June 03–05, 2018, Woodstock, NY in Neural Information Processing Systems36 (2023), 74530–74543

work page 2023
[78]

Chi, Quoc V

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V. Le, and Denny Zhou. 2022. Chain-of-Thought Prompt- ing Elicits Reasoning in Large Language Models. InAdvances in Neural Infor- mation Processing Systems 35: Annual Conference on Neural Information Pro- cessing Systems 2022, NeurIPS 2022, New Orleans, LA, USA,...

work page 2022
[79]

An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. 2025. Qwen3 technical report.arXiv preprint arXiv:2505.09388(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[80]

An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, Huan Lin, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jingren Zhou, Junyang Lin, Kai Dang, Keming Lu, Keqin Bao, Kexin Yang, Le Yu, Mei Li, Mingfeng Xue, Pei Zhang, Qin Zhu, Rui Men, Runji Lin, Tianhao Li, Tingyu X...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2412.15115 2024

Showing first 80 references.