Search-R3: Unifying Reasoning and Embedding in Large Language Models
Pith reviewed 2026-05-18 09:22 UTC · model grok-4.3
The pith
Search-R3 trains LLMs to output retrieval embeddings directly as the result of step-by-step reasoning.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Search-R3 adapts LLMs so that they generate search embeddings as a direct output of their chain-of-thought reasoning. The framework uses three mechanisms: supervised learning to establish basic embedding quality, reinforcement learning that optimizes embedding generation together with the reasoning steps, and a specialized RL environment that manages changing embeddings without requiring complete corpus re-encoding at every training iteration. Evaluations on diverse benchmarks show that this unification produces stronger retrieval results than prior methods that treat reasoning and embedding as separate processes.
What carries the argument
The Search-R3 framework, which treats embedding generation as an extension of the LLM's chain-of-thought reasoning and optimizes it jointly with reasoning via supervised learning and a specialized RL environment that avoids full corpus re-encoding.
If this is right
- LLMs can be post-trained to perform retrieval without maintaining a separate embedding model.
- Chain-of-thought analysis can directly shape embedding vectors for more semantically precise retrieval.
- Reinforcement learning can jointly tune reasoning quality and embedding usefulness in a single loop.
- Training remains practical because embeddings can be updated incrementally without re-processing the full corpus each iteration.
- Complex knowledge-intensive tasks become feasible inside one model rather than through stitched-together components.
Where Pith is reading between the lines
- The same unification pattern might let LLMs produce structured outputs other than embeddings, such as database queries or code snippets, directly from reasoning.
- If the RL environment generalizes across domains, the method could reduce the current separation between reasoning models and retrieval indexes in production systems.
- A natural next test would be whether the trained model maintains its advantage when the underlying corpus changes frequently after deployment.
Load-bearing premise
The specialized RL environment can efficiently optimize embedding generation without requiring complete corpus re-encoding at each training iteration while still producing embeddings that generalize to unseen queries and corpora.
What would settle it
Run Search-R3 on a held-out corpus and query set; if retrieval accuracy falls below strong non-unified baselines or if the model requires full re-encoding to maintain performance, the unification claim does not hold.
Figures
read the original abstract
Despite their remarkable natural language understanding capabilities, Large Language Models (LLMs) have been underutilized for retrieval tasks. We present Search-R3, a novel framework that addresses this limitation by adapting LLMs to generate search embeddings as a direct output of their reasoning process. Our approach exploits LLMs' chain-of-thought capabilities, allowing them to produce more effective embeddings by reasoning step-by-step through complex semantic analyses. We implement this through three complementary mechanisms. (1) a supervised learning stage enables the model's ability to produce quality embeddings, (2) a reinforcement learning (RL) methodology that optimizes embedding generation alongside reasoning, and (3) a specialized RL environment that efficiently handles evolving embedding representations without requiring complete corpus re-encoding at each training iteration. Our extensive evaluations on diverse benchmarks demonstrate that Search-R3 significantly outperforms prior methods by unifying the reasoning and embedding generation processes. This integrated post-training approach represents a substantial advancement in handling complex knowledge-intensive tasks that require both sophisticated reasoning and effective information retrieval. Project page: https://github.com/ytgui/Search-R3
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Search-R3, a framework adapting LLMs to generate search embeddings directly from their reasoning process. It employs three mechanisms: (1) supervised learning to produce quality embeddings, (2) RL optimization of embedding generation alongside reasoning, and (3) a specialized RL environment that handles evolving embedding representations without requiring complete corpus re-encoding at each iteration. The central claim is that this unification yields significant outperformance over prior methods on diverse benchmarks for complex knowledge-intensive tasks.
Significance. If the empirical claims hold with proper controls, the work would advance integration of reasoning and retrieval in LLMs by leveraging chain-of-thought for embedding generation and addressing computational barriers via incremental embedding updates. The GitHub project page supporting code release is a positive step toward reproducibility.
major comments (2)
- [Abstract / RL environment description] Abstract and description of mechanism (3): The specialized RL environment is claimed to efficiently handle evolving embeddings without complete corpus re-encoding, yet no approximation technique (caching, delta-indexing, or otherwise), reward formulation for embedding quality, or ablation isolating this component is provided. This is load-bearing for the unification claim, as the headline outperformance and efficiency gains rest on embeddings remaining high-quality and generalizable to unseen queries/corpora under incremental updates.
- [Evaluation / Experiments] Evaluation section: The abstract asserts 'extensive evaluations' and 'significant outperformance' but supplies no specifics on benchmarks, baselines, statistical significance testing, or ablations. Without these, it is impossible to confirm that reported gains do not reduce to fitted reward parameters or self-referential loops, undermining verification of the central claim.
minor comments (1)
- [Abstract] The abstract could more clearly distinguish the three mechanisms and their individual contributions to avoid conflating supervised fine-tuning effects with the RL components.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. The comments highlight areas where additional technical details and clarity can strengthen the presentation of Search-R3. We address each major comment below and indicate the revisions we will incorporate.
read point-by-point responses
-
Referee: [Abstract / RL environment description] Abstract and description of mechanism (3): The specialized RL environment is claimed to efficiently handle evolving embeddings without complete corpus re-encoding, yet no approximation technique (caching, delta-indexing, or otherwise), reward formulation for embedding quality, or ablation isolating this component is provided. This is load-bearing for the unification claim, as the headline outperformance and efficiency gains rest on embeddings remaining high-quality and generalizable to unseen queries/corpora under incremental updates.
Authors: We agree that the current description of the specialized RL environment is high-level and would benefit from greater technical specificity to support the efficiency and unification claims. In the revised manuscript, we will expand Section 3.3 to detail the approximation technique, which relies on delta-indexing combined with selective caching of prior embedding states to avoid full corpus re-encoding. We will also explicitly state the reward formulation (a composite of embedding similarity to ground-truth retrieval targets and reasoning coherence) and include a dedicated ablation that isolates the incremental-update component. These additions will directly address concerns about generalizability to unseen queries and corpora. revision: yes
-
Referee: [Evaluation / Experiments] Evaluation section: The abstract asserts 'extensive evaluations' and 'significant outperformance' but supplies no specifics on benchmarks, baselines, statistical significance testing, or ablations. Without these, it is impossible to confirm that reported gains do not reduce to fitted reward parameters or self-referential loops, undermining verification of the central claim.
Authors: The referee correctly notes that the abstract is summary-level. The full manuscript reports results across multiple knowledge-intensive benchmarks with comparisons to prior retrieval and reasoning methods, plus initial ablations. To improve verifiability, we will revise the abstract to name the primary benchmarks and baselines, and we will augment the evaluation section with explicit statistical significance testing (e.g., paired t-tests) and further ablations that control for reward-parameter fitting and potential self-referential effects. These changes will allow readers to better assess whether gains stem from the unified reasoning-embedding process. revision: yes
Circularity Check
No circularity: unification presented as empirical framework with external validation
full rationale
The abstract outlines three mechanisms—supervised learning for embedding quality, RL optimization of embeddings alongside reasoning, and a specialized RL environment for incremental updates without full re-encoding—then claims outperformance via evaluations on diverse benchmarks. No equations, fitted parameters renamed as predictions, or self-citations are shown that would reduce the unification result to its own inputs by construction. The derivation chain is presented as a composite training approach whose success is measured externally rather than derived tautologically from the mechanisms themselves.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
specialized RL environment that efficiently handles evolving embedding representations without requiring complete corpus re-encoding at each training iteration
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanLogicNat.induction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
GRPO with DCGscaled reward on query-positive-negative triplets
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Anirudh Ajith, Mengzhou Xia, Alexis Chevalier, Tanya Goyal, Danqi Chen, and Tianyu Gao. 2024. LitSearch: A Retrieval Benchmark for Scientific Literature Search. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, EMNLP 2024, Miami, FL, USA, November 12-16, 2024, Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen (Eds.)...
-
[2]
Ben Abacha Asma and Demner-Fushman Dina. 2019. A Question-Entailment Ap- proach to Question Answering.BMC Bioinform.20, 1 (2019), 511:1–511:23. https: //bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-3119-4
-
[3]
Parul Awasthy, Aashka Trivedi, Yulong Li, Mihaela Bornea, David Cox, Abraham Daniels, Martin Franz, Gabe Goodhart, Bhavani Iyer, Vishwajeet Kumar, et al
- [4]
-
[5]
Vassileios Balntas, Edgar Riba, Daniel Ponsa, and Krystian Mikolajczyk. 2016. Learning local feature descriptors with triplets and shallow convolutional neural networks. InProceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19-22, 2016, Richard C. Wilson, Edwin R. Hancock, and William A. P. Smith (Eds.). BMVA Press. h...
work page 2016
-
[6]
Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomás Mikolov. 2017. Enriching Word Vectors with Subword Information.Trans. Assoc. Comput. Lin- guistics5 (2017), 135–146. doi:10.1162/TACL_A_00051
-
[7]
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin...
work page 2020
-
[8]
Jianlv Chen, Shitao Xiao, Peitian Zhang, Kun Luo, Defu Lian, and Zheng Liu
-
[9]
BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation.CoRRabs/2402.03216 (2024). doi:10.48550/ARXIV.2402.03216 arXiv:2402.03216
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2402.03216 2024
-
[10]
Qi Chen, Xiubo Geng, Corby Rosset, Carolyn Buractaon, Jingwen Lu, Tao Shen, Kun Zhou, Chenyan Xiong, Yeyun Gong, Paul N. Bennett, Nick Craswell, Xing Xie, Fan Yang, Bryan Tower, Nikhil Rao, Anlei Dong, Wenqi Jiang, Zheng Liu, Mingqin Li, Chuanjie Liu, Zengzhong Li, Rangan Majumder, Jennifer Neville, Andy Oakley, Knut Magne Risvik, Harsha Vardhan Simhadri,...
-
[11]
Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Yunxuan Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Al- bert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Alex Castro-Ros, Marie Pellat, Kevin Robinson, Dasha Valter, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Y. Zhao, Yanp...
work page 2024
-
[12]
Wikipedia contributors. 2025. Wikipedia, the free encyclopedia. https://en. wikipedia.org/wiki/Main_Page Accessed: 2025-10-01
work page 2025
- [13]
-
[14]
DeepSeek-AI, Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, Xiaokang Zhang, Xingkai Yu, Yu Wu, Z. F. Wu, Zhibin Gou, Zhihong Shao, Zhuoshu Li, Ziyi Gao, Aixin Liu, Bing Xue, Bingxuan Wang, Bochao Wu, Bei Feng, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, Damai D...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2501.12948 2025
-
[15]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Associa- tion for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, ...
-
[16]
Darren Edge, Ha Trinh, Newman Cheng, Joshua Bradley, Alex Chao, Apurva Mody, Steven Truitt, Dasha Metropolitansky, Robert Osazuwa Ness, and Jonathan Larson. 2024. From local to global: A graph rag approach to query-focused summarization.arXiv preprint arXiv:2404.16130(2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[17]
Tianyu Gao, Xingcheng Yao, and Danqi Chen. 2021. SimCSE: Simple Contrastive Learning of Sentence Embeddings. InProceedings of the 2021 Conference on Em- pirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021, Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih (...
-
[18]
Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yixin Dai, Jiawei Sun, Haofen Wang, and Haofen Wang. 2023. Retrieval-augmented generation for large language models: A survey.arXiv preprint arXiv:2312.10997 2, 1 (2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[19]
Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. 2024. The llama 3 herd of models.arXiv preprint arXiv:2407.21783 (2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[20]
Yuxian Gu, Li Dong, Furu Wei, and Minlie Huang. 2023. Minillm: Knowledge distillation of large language models.arXiv preprint arXiv:2306.08543(2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[21]
Haoyu Han, Yu Wang, Harry Shomer, Kai Guo, Jiayuan Ding, Yongjia Lei, Ma- hantesh Halappanavar, Ryan A Rossi, Subhabrata Mukherjee, Xianfeng Tang, et al
-
[22]
Retrieval-augmented generation with graphs (graphrag).arXiv preprint arXiv:2501.00309(2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
- [23]
-
[24]
Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network.arXiv preprint arXiv:1503.02531(2015)
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[25]
Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen
Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. 2022. LoRA: Low-Rank Adaptation of Large Language Models. InThe Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net. https://openreview.net/forum?id=nZeVKeeFYf9
work page 2022
-
[26]
Junjie Huang, Duyu Tang, Linjun Shou, Ming Gong, Ke Xu, Daxin Jiang, Ming Zhou, and Nan Duan. 2021. CoSQA: 20, 000+ Web Queries for Code Search and Question Answering. InProceedings of the 59th Annual Meeting of the As- sociation for Computational Linguistics and the 11th International Joint Confer- ence on Natural Language Processing, ACL/IJCNLP 2021, (V...
-
[27]
HuggingFaceFW. 2025. clean-wikipedia dataset at Hugging Face. https: //huggingface.co/datasets/HuggingFaceFW/clean-wikipedia Accessed: 2025-10- 01
work page 2025
-
[28]
Hamel Husain, Ho-Hsiang Wu, Tiferet Gazit, Miltiadis Allamanis, and Marc Brockschmidt. 2019. Codesearchnet challenge: Evaluating the state of semantic code search.arXiv preprint arXiv:1909.09436(2019)
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[29]
Bowen Jin, Hansi Zeng, Zhenrui Yue, Dong Wang, Hamed Zamani, and Jiawei Han. 2025. Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning.CoRRabs/2503.09516 (2025). doi:10.48550/ARXIV. 2503.09516 arXiv:2503.09516
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv 2025
-
[30]
Mandar Joshi, Eunsol Choi, Daniel S. Weld, and Luke Zettlemoyer. 2017. TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehen- sion. InProceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, Vancouver, Canada, July 30 - August 4, Volume 1: Long Pa- pers, Regina Barzilay and Min-Yen K...
-
[31]
Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. 2020. Dense Passage Retrieval for Open- Domain Question Answering. InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16-20, 2020, Bonnie Webber, Trevor Cohn, Yulan He, and Ya...
-
[32]
Andreas Köpf, Yannic Kilcher, Dimitri von Rütte, Sotiris Anagnostidis, Zhi Rui Tam, Keith Stevens, Abdullah Barhoum, Duc Nguyen, Oliver Stan- ley, Richárd Nagyfi, Shahul ES, Sameer Suri, David Glushkov, Arnav Dan- tuluri, Andrew Maguire, Christoph Schuhmann, Huu Nguyen, and Alexan- der Mattick. 2023. OpenAssistant Conversations - Democratizing Large Langu...
work page 2023
-
[33]
Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. 2020. ALBERT: A Lite BERT for Self-supervised Learn- ing of Language Representations. In8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenRe- view.net. https://openreview.net/forum?id=H1eA7AEtvS
work page 2020
-
[34]
Xiang Lisa Li and Percy Liang. 2021. Prefix-Tuning: Optimizing Continuous Prompts for Generation. InProceedings of the 59th Annual Meeting of the As- sociation for Computational Linguistics and the 11th International Joint Confer- ence on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Pa- pers), Virtual Event, August 1-6, 2021, Chengqing Zo...
-
[35]
Yiwei Li, Jiayi Shi, Shaoxiong Feng, Peiwen Yuan, Xinglin Wang, Boyuan Pan, Heda Wang, Yao Hu, and Kan Li. 2024. Instruction Embedding: Latent Repre- sentations of Instructions Towards Task Identification. InAdvances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, NeurIPS 2024, Vancouver, BC, C...
work page 2024
-
[36]
Zhizhong Li and Derek Hoiem. 2017. Learning without forgetting.IEEE transac- tions on pattern analysis and machine intelligence40, 12 (2017), 2935–2947
work page 2017
-
[37]
Zehan Li, Xin Zhang, Yanzhao Zhang, Dingkun Long, Pengjun Xie, and Meis- han Zhang. 2023. Towards General Text Embeddings with Multi-stage Con- trastive Learning.CoRRabs/2308.03281 (2023). doi:10.48550/ARXIV.2308.03281 arXiv:2308.03281
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2308.03281 2023
-
[38]
Hunter Lightman, Vineet Kosaraju, Yuri Burda, Harrison Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, and Karl Cobbe. 2023. Let’s verify step by step. InThe Twelfth International Conference on Learning Representations
work page 2023
-
[39]
Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. 2023. Visual in- struction tuning.Advances in neural information processing systems36 (2023), 34892–34916
work page 2023
-
[40]
Jerry Liu. 2022.LlamaIndex. doi:10.5281/zenodo.1234
-
[41]
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach.CoRRabs/1907.11692 (2019). Conference acronym ’XX, June 03–05, 2018, Woodstock, NY Yuntao Gui and James Cheng arXiv:1907.11692 http://arxiv.org/abs/1907.11692
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[42]
Kyle Lo, Lucy Lu Wang, Mark Neumann, Rodney Kinney, and Daniel S. Weld
-
[43]
S2ORC: The Semantic Scholar Open Research Corpus. InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020, Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel R. Tetreault (Eds.). Association for Computational Linguistics, 4969–4983. doi:10.18653/V1/2020.ACL-MAIN.447
-
[44]
Le, Barret Zoph, Jason Wei, and Adam Roberts
Shayne Longpre, Le Hou, Tu Vu, Albert Webson, Hyung Won Chung, Yi Tay, Denny Zhou, Quoc V. Le, Barret Zoph, Jason Wei, and Adam Roberts. 2023. The Flan Collection: Designing Data and Methods for Effective Instruction Tuning. InInternational Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA (Proceedings of Machine Learning R...
work page 2023
- [45]
-
[46]
Tomás Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. InAdvances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Pro- ceedings of a meeting held December 5-8, 2013, Lake Tahoe, Nevad...
work page 2013
-
[47]
Shervin Minaee, Nal Kalchbrenner, Erik Cambria, Narjes Nikzad, Meysam Chenaghlu, and Jianfeng Gao. 2022. Deep Learning-based Text Classifica- tion: A Comprehensive Review.ACM Comput. Surv.54, 3 (2022), 62:1–62:40. doi:10.1145/3439726
- [48]
-
[49]
Niklas Muennighoff, Nouamane Tazi, Loïc Magne, and Nils Reimers. 2022. MTEB: Massive Text Embedding Benchmark.arXiv preprint arXiv:2210.07316(2022). doi:10.48550/ARXIV.2210.07316
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2210.07316 2022
-
[50]
Hall, Daniel Cer, and Yinfei Yang
Jianmo Ni, Gustavo Hernández Ábrego, Noah Constant, Ji Ma, Keith B. Hall, Daniel Cer, and Yinfei Yang. 2022. Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models. InFindings of the Association for Computational Linguistics: ACL 2022, Dublin, Ireland, May 22-27, 2022, Smaranda Muresan, Preslav Nakov, and Aline Villavicencio (Eds.). ...
-
[51]
Jianmo Ni, Chen Qu, Jing Lu, Zhuyun Dai, Gustavo Hernández Ábrego, Ji Ma, Vincent Y. Zhao, Yi Luan, Keith B. Hall, Ming-Wei Chang, and Yinfei Yang. 2022. Large Dual Encoders Are Generalizable Retrievers. InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 202...
-
[52]
Aaron van den Oord, Yazhe Li, and Oriol Vinyals. 2018. Representation learning with contrastive predictive coding.arXiv preprint arXiv:1807.03748(2018)
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[53]
Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul F. Christiano, Jan Leike, and Ryan Lowe. 2022. Train- ing language models to follow instructions with h...
work page 2022
-
[54]
Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. Glove: Global Vectors for Word Representation. InProceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL, Alessandro Moschitti, Bo Pang, and Walter Dael...
- [55]
-
[56]
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning Transferable Visual Models From Natural Language Supervision. InProceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July ...
work page 2021
-
[57]
Manning, Ste- fano Ermon, and Chelsea Finn
Rafael Rafailov, Archit Sharma, Eric Mitchell, Christopher D. Manning, Ste- fano Ermon, and Chelsea Finn. 2023. Direct Preference Optimization: Your Language Model is Secretly a Reward Model. InAdvances in Neural Infor- mation Processing Systems 36: Annual Conference on Neural Information Pro- cessing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, Dece...
work page 2023
-
[58]
Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Confer- ence on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019, Kentaro Inui, Jing Jiang, Vince...
-
[59]
Stephen Robertson, Hugo Zaragoza, et al . 2009. The probabilistic relevance framework: BM25 and beyond.Foundations and Trends®in Information Retrieval 3, 4 (2009), 333–389
work page 2009
-
[60]
Anna Rogers, Olga Kovaleva, and Anna Rumshisky. 2020. A Primer in BERTology: What We Know About How BERT Works.Trans. Assoc. Comput. Linguistics8 (2020), 842–866. doi:10.1162/TACL_A_00349
-
[61]
Gerard Salton and Christopher Buckley. 1988. Term-weighting approaches in automatic text retrieval.Information processing & management24, 5 (1988), 513–523
work page 1988
-
[62]
Victor Sanh, Albert Webson, Colin Raffel, Stephen H Bach, Lintang Sutawika, Zaid Alyafeai, Antoine Chaffin, Arnaud Stiegler, Teven Le Scao, Arun Raja, et al
-
[63]
Multitask prompted training enables zero-shot task generalization.arXiv preprint arXiv:2110.08207(2021)
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[64]
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov
-
[65]
Proximal Policy Optimization Algorithms
Proximal Policy Optimization Algorithms.CoRRabs/1707.06347 (2017). arXiv:1707.06347 http://arxiv.org/abs/1707.06347
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[66]
Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Mingchuan Zhang, Y. K. Li, Y. Wu, and Daya Guo. 2024. DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models.CoRRabs/2402.03300 (2024). doi:10.48550/ARXIV.2402.03300 arXiv:2402.03300
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2402.03300 2024
-
[67]
Karen Spärck Jones. 2004. A statistical interpretation of term specificity and its application in retrieval.Journal of documentation60, 5 (2004), 493–502
work page 2004
-
[68]
Smith, Luke Zettlemoyer, and Tao Yu
Hongjin Su, Weijia Shi, Jungo Kasai, Yizhong Wang, Yushi Hu, Mari Ostendorf, Wen-tau Yih, Noah A. Smith, Luke Zettlemoyer, and Tao Yu. 2023. One Embedder, Any Task: Instruction-Finetuned Text Embeddings. InFindings of the Associa- tion for Computational Linguistics: ACL 2023, Toronto, Canada, July 9-14, 2023, Anna Rogers, Jordan L. Boyd-Graber, and Naoaki...
- [69]
-
[70]
Rohan Taori, Ishaan Gulrajani, Tianyi Zhang, Yann Dubois, Xuechen Li, Carlos Guestrin, Percy Liang, and Tatsunori B Hashimoto. 2023. Alpaca: A strong, replicable instruction-following model.Stanford Center for Research on Foundation Models. https://crfm. stanford. edu/2023/03/13/alpaca. html3, 6 (2023), 7
work page 2023
-
[71]
Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurélien Rodriguez, Armand Joulin, Edouard Grave, and Guillaume Lam- ple. 2023. LLaMA: Open and Efficient Foundation Language Models.CoRR abs/2302.13971 (2023). doi:10.48550/ARXIV.2302.13971 arXiv:...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2302.13971 2023
-
[72]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need.Advances in neural information processing systems30 (2017)
work page 2017
-
[73]
Henrique Schechter Vera, Sahil Dua, Biao Zhang, Daniel Salz, Ryan Mullins, Sindhu Raghuram Panyam, Sara Smoot, Iftekhar Naim, Joe Zou, Feiyang Chen, et al. 2025. EmbeddingGemma: Powerful and Lightweight Text Representations. arXiv preprint arXiv:2509.20354(2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[74]
David Wadden, Shanchuan Lin, Kyle Lo, Lucy Lu Wang, Madeleine van Zuylen, Arman Cohan, and Hannaneh Hajishirzi. 2020. Fact or Fiction: Verifying Scientific Claims. InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Online, 7534–7550. doi:10.18653/v1/2020.emnlp-main.609
-
[75]
Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Majumder, and Furu Wei. 2022. Text Embeddings by Weakly-Supervised Contrastive Pre-training.CoRRabs/2212.03533 (2022). doi:10.48550/ARXIV.2212. 03533 arXiv:2212.03533
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2212 2022
-
[76]
Peiyi Wang, Lei Li, Zhihong Shao, Runxin Xu, Damai Dai, Yifei Li, Deli Chen, Yu Wu, and Zhifang Sui. 2024. Math-Shepherd: Verify and Reinforce LLMs Step- by-step without Human Annotations. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2024, Bangkok, Thailand, August 11-16, 2024, Lun-...
-
[77]
Weizhi Wang, Li Dong, Hao Cheng, Xiaodong Liu, Xifeng Yan, Jianfeng Gao, and Furu Wei. 2023. Augmenting language models with long-term memory.Advances Search-R3: Unifying Reasoning and Embedding Generation in Large Language Models Conference acronym ’XX, June 03–05, 2018, Woodstock, NY in Neural Information Processing Systems36 (2023), 74530–74543
work page 2023
-
[78]
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V. Le, and Denny Zhou. 2022. Chain-of-Thought Prompt- ing Elicits Reasoning in Large Language Models. InAdvances in Neural Infor- mation Processing Systems 35: Annual Conference on Neural Information Pro- cessing Systems 2022, NeurIPS 2022, New Orleans, LA, USA,...
work page 2022
-
[79]
An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. 2025. Qwen3 technical report.arXiv preprint arXiv:2505.09388(2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[80]
An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, Huan Lin, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jingren Zhou, Junyang Lin, Kai Dang, Keming Lu, Keqin Bao, Kexin Yang, Le Yu, Mei Li, Mingfeng Xue, Pei Zhang, Qin Zhu, Rui Men, Runji Lin, Tianhao Li, Tingyu X...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2412.15115 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.