REPLUG: Retrieval-Augmented Black-Box Language Models
Pith reviewed 2026-05-17 12:36 UTC · model grok-4.3
The pith
REPLUG augments frozen black-box LMs like GPT-3 with a tunable retriever by prepending documents and training the retriever on the LM's own predictions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
REPLUG treats the language model as a black box and augments it by prepending documents retrieved by a tuneable model. The LM itself supervises the retriever by providing signals that indicate which documents improve its predictions. This yields a 6.3% improvement on language modeling for GPT-3 (175B) and a 5.1% gain on five-shot MMLU for Codex.
What carries the argument
The REPLUG framework, which prepends documents from a tuneable retriever to the input of a frozen LM and uses the LM's prediction loss to supervise retriever training.
If this is right
- The method applies to any existing LM and retriever without special cross-attention training.
- Performance on language modeling for GPT-3 (175B) rises by 6.3%.
- Five-shot accuracy on MMLU for Codex rises by 5.1%.
- No need to retrain or modify the underlying language model to obtain the gains.
Where Pith is reading between the lines
- The same LM-supervised retriever tuning might extend to other external knowledge sources such as knowledge graphs or APIs.
- Closed API-only models could gain retrieval benefits if the retriever runs externally and only the input prefix is supplied.
- Scaling the retriever independently of the LM size could become a separate efficiency lever for very large models.
Load-bearing premise
The frozen language model can supply reliable supervision signals that identify documents genuinely helpful for its own predictions without introducing bias or needing task labels.
What would settle it
If retraining the retriever on random or unhelpful documents eliminates the reported gains on GPT-3 language modeling and Codex MMLU, the value of LM-based supervision would be refuted.
read the original abstract
We introduce REPLUG, a retrieval-augmented language modeling framework that treats the language model (LM) as a black box and augments it with a tuneable retrieval model. Unlike prior retrieval-augmented LMs that train language models with special cross attention mechanisms to encode the retrieved text, REPLUG simply prepends retrieved documents to the input for the frozen black-box LM. This simple design can be easily applied to any existing retrieval and language models. Furthermore, we show that the LM can be used to supervise the retrieval model, which can then find documents that help the LM make better predictions. Our experiments demonstrate that REPLUG with the tuned retriever significantly improves the performance of GPT-3 (175B) on language modeling by 6.3%, as well as the performance of Codex on five-shot MMLU by 5.1%.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces REPLUG, a retrieval-augmented framework for black-box LMs such as GPT-3 and Codex. Retrieved documents are prepended to the input of a frozen LM; the LM itself supplies the supervision signal (via log-probability or perplexity) to train a tunable retriever. Experiments report that the tuned retriever yields a 6.3% improvement on language modeling for GPT-3 (175B) and a 5.1% improvement on five-shot MMLU for Codex.
Significance. If the gains prove robust and free of supervision-induced bias, the work demonstrates a lightweight, architecture-agnostic way to retrofit retrieval into existing large frozen models. This is practically significant because it avoids the cost of retraining or modifying the LM parameters and cross-attention layers required by prior retrieval-augmented LMs.
major comments (2)
- [§3] §3 (Retriever Training): The supervision procedure uses the frozen LM’s own log-probabilities on target tokens to score candidate documents. The manuscript does not state whether the documents scored during retriever training are drawn from a corpus slice strictly disjoint from the evaluation sets used for the final LM and MMLU numbers. Without an explicit held-out split or a control experiment on a disjoint corpus, the reported 6.3% and 5.1% gains risk optimistic bias.
- [§4] §4 (Experiments): The headline improvements are given as single percentage figures with no error bars, no number of random seeds, and no statistical significance tests. Table or figure reporting the per-task or per-period breakdowns should include these quantities so that the reader can judge whether the gains are stable.
minor comments (2)
- [§2] The notation distinguishing the retrieval model parameters from the frozen LM parameters could be introduced earlier and used consistently.
- [Figure 1] Figure 1 caption should explicitly list the exact prompt format used when prepending retrieved documents to the black-box LM.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We address each major point below and indicate planned revisions to improve clarity and robustness.
read point-by-point responses
-
Referee: [§3] §3 (Retriever Training): The supervision procedure uses the frozen LM’s own log-probabilities on target tokens to score candidate documents. The manuscript does not state whether the documents scored during retriever training are drawn from a corpus slice strictly disjoint from the evaluation sets used for the final LM and MMLU numbers. Without an explicit held-out split or a control experiment on a disjoint corpus, the reported 6.3% and 5.1% gains risk optimistic bias.
Authors: We agree that explicit confirmation of disjoint data is necessary to eliminate any concern of optimistic bias. The retriever is trained on documents drawn from a standard retrieval corpus (Wikipedia and Common Crawl slices) that does not overlap with the held-out evaluation sets used for language modeling (Pile test split) or MMLU (official test set). We will revise §3 to state the exact corpus sources and splits used for retriever training versus final evaluation, thereby making the separation explicit. revision: yes
-
Referee: [§4] §4 (Experiments): The headline improvements are given as single percentage figures with no error bars, no number of random seeds, and no statistical significance tests. Table or figure reporting the per-task or per-period breakdowns should include these quantities so that the reader can judge whether the gains are stable.
Authors: We acknowledge that variability measures would strengthen the results. Because of the prohibitive cost of repeated queries to 175B-scale black-box models, the primary numbers reflect single runs. We will add a note on this limitation and, where computationally feasible, report standard deviations from repeated runs on smaller models or task subsets. We will also expand the per-task and per-period tables to include these quantities and any applicable significance tests. revision: partial
Circularity Check
No significant circularity; LM supervision uses held-out splits for retriever tuning
full rationale
The paper's core derivation uses the frozen LM's log-probabilities on target tokens to supervise retriever training, then prepends retrieved documents to the same LM at inference. This does not reduce to a self-definition or fitted-input prediction by construction because the reported gains (6.3% on GPT-3 LM, 5.1% on Codex MMLU) are measured on explicitly held-out language-modeling and MMLU evaluation sets. No equations equate the final improvement to the supervision signal itself, and the method remains self-contained against external benchmarks without load-bearing self-citations or ansatz smuggling. The supervision signal is independent of the final test contexts.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The language model can be used to supervise the retrieval model to find documents that help it make better predictions.
Forward citations
Cited by 18 Pith papers
-
Evaluating Very Long-Term Conversational Memory of LLM Agents
Creates LoCoMo benchmark dataset for very long-term LLM conversational memory and shows current models struggle with lengthy dialogues and long-range temporal dynamics.
-
Privacy Without Losing Place: A Paradigm for Private Retrieval in Spatial RAGs
PAS encodes locations via relative anchors and bins to deliver roughly 370-400m adversarial error in spatial RAG while retaining over half the baseline retrieval performance and keeping generation quality robust.
-
C-Pack: Packed Resources For General Chinese Embeddings
C-Pack releases a new Chinese embedding benchmark, large training dataset, and optimized models that outperform priors by up to 10% on C-MTEB while also delivering English SOTA results.
-
LLM+P: Empowering Large Language Models with Optimal Planning Proficiency
LLM+P lets LLMs solve planning problems optimally by converting them to PDDL for classical planners and back to natural language.
-
Transforming External Knowledge into Triplets for Enhanced Retrieval in RAG of LLMs
Tri-RAG turns external knowledge into Condition-Proof-Conclusion triplets and retrieves via the Condition anchor to improve efficiency and quality in LLM RAG.
-
AtlasKV: Augmenting LLMs with Billion-Scale Knowledge Graphs in 20GB VRAM
AtlasKV integrates billion-scale KGs into LLMs parametrically with sub-linear complexity and low memory by converting triples into key-value representations handled by the model's attention.
-
ZeroSearch: Incentivize the Search Capability of LLMs without Searching
ZeroSearch simulates search engine interactions via supervised fine-tuning of a retrieval module and curriculum-based RL degradation of document quality, achieving comparable or superior performance to real search eng...
-
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning
R1-Searcher uses two-stage outcome-based RL to train LLMs to invoke external search systems for better reasoning without process rewards or distillation.
-
NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models
NV-Embed achieves first place on the MTEB leaderboard across 56 tasks by combining a latent attention layer, causal-mask removal, two-stage contrastive training, and data curation for LLM-based embedding models.
-
RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
RAPTOR introduces a tree-organized retrieval method using recursive abstractive summaries, achieving a 20% absolute accuracy improvement on the QuALITY benchmark when paired with GPT-4.
-
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
Self-RAG trains LLMs to adaptively retrieve passages on demand and self-critique using reflection tokens, outperforming ChatGPT and retrieval-augmented Llama2 on QA, reasoning, and fact verification.
-
Aligning Large Multimodal Models with Factually Augmented RLHF
Factually Augmented RLHF aligns large multimodal models to reduce hallucinations, reaching 94% of GPT-4 on LLaVA-Bench and 60% improvement on the new MMHAL-BENCH.
-
AgenticRAG: Agentic Retrieval for Enterprise Knowledge Bases
AgenticRAG equips an LLM with iterative retrieval and navigation tools, delivering 49.6% recall@1 on BRIGHT, 0.96 factuality on WixQA, and 92% correctness on FinanceBench.
-
RASP-Tuner: Retrieval-Augmented Soft Prompts for Context-Aware Black-Box Optimization in Non-Stationary Environments
RASP-Tuner matches or beats GP-UCB and CMA-ES regret on seven of nine synthetic non-stationary tasks while running 8-12 times faster per step.
-
Retrieval-Augmented Generation for AI-Generated Content: A Survey
A survey classifying RAG foundations for AIGC, summarizing enhancements, cross-modal applications, benchmarks, limitations, and future directions.
-
A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions
The paper surveys hallucination in LLMs with an innovative taxonomy, factors, detection methods, benchmarks, mitigation strategies, and open research directions.
-
Towards General Text Embeddings with Multi-stage Contrastive Learning
GTE_base is a compact text embedding model using multi-stage contrastive learning on diverse data that outperforms OpenAI's API and 10x larger models on massive benchmarks and works for code as text.
-
The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision)
GPT-4V processes interleaved image-text inputs generically and supports visual referring prompting for new human-AI interaction.
Reference graph
Works this paper leans on
-
[1]
International Conference on Machine Learning , pages=
Improving language models by retrieving from trillions of tokens , author=. International Conference on Machine Learning , pages=. 2022 , organization=
work page 2022
-
[2]
Pointer sentinel mixture models , author=. 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings , year=
work page 2017
-
[3]
Democratizing Access to large-scale language models with OPT-175B , author=. Meta AI , year=
-
[4]
arXiv preprint arXiv:2110.04725 , year=
Yuan 1.0: Large-scale pre-trained language model in zero-shot and few-shot learning , author=. arXiv preprint arXiv:2110.04725 , year=
-
[5]
Language Models are Few-Shot Learners , url =
Brown, Tom and Mann, Benjamin and Ryder, Nick and Subbiah, Melanie and Kaplan, Jared D and Dhariwal, Prafulla and Neelakantan, Arvind and Shyam, Pranav and Sastry, Girish and Askell, Amanda and Agarwal, Sandhini and Herbert-Voss, Ariel and Krueger, Gretchen and Henighan, Tom and Child, Rewon and Ramesh, Aditya and Ziegler, Daniel and Wu, Jeffrey and Winte...
-
[6]
Younes Belkda, Tim Dettmers , title =
-
[7]
Mark Chen and Jerry Tworek and Heewoo Jun and Qiming Yuan and Henrique Ponde de Oliveira Pinto and Jared Kaplan and Harrison Edwards and Yuri Burda and Nicholas Joseph and Greg Brockman and Alex Ray and Raul Puri and Gretchen Krueger and Michael Petrov and Heidy Khlaaf and Girish Sastry and Pamela Mishkin and Brooke Chan and Scott Gray and Nick Ryder and ...
work page 2021
-
[8]
International Conference on Machine Learning , pages=
Calibrate before use: Improving few-shot performance of language models , author=. International Conference on Machine Learning , pages=. 2021 , organization=
work page 2021
-
[9]
Large Dual Encoders Are Generalizable Retrievers , year =
Jianmo Ni and Chen Qu and Jing Lu and Zhuyun Dai and Gustavo Hern. Large Dual Encoders Are Generalizable Retrievers , year =
-
[10]
Prompting gpt-3 to be reliable
Prompting GPT-3 To Be Reliable , author=. arXiv preprint arXiv:2210.09150 , year=
-
[11]
Training Compute-Optimal Large Language Models
Training Compute-Optimal Large Language Models , author=. arXiv preprint arXiv:2203.15556 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[12]
One Embedder, Any Task: Instruction-Finetuned Text Embeddings , author=. arXiv preprint arXiv:2212.09741 , year=
-
[13]
Atlas: Few-shot Learning with Retrieval Augmented Language Models
Few-shot learning with retrieval augmented language models , author=. arXiv preprint arXiv:2208.03299 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[14]
Advances in Neural Information Processing Systems , volume=
Retrieval-augmented generation for knowledge-intensive nlp tasks , author=. Advances in Neural Information Processing Systems , volume=
-
[15]
Empirical Methods in Natural Language Processing (EMNLP) , year=
Training Language Models with Memory Augmentation , author=. Empirical Methods in Natural Language Processing (EMNLP) , year=
-
[16]
arXiv preprint arXiv:2212.01349 , year=
Nonparametric Masked Language Modeling , author=. arXiv preprint arXiv:2212.01349 , year=
-
[17]
International Conference on Learning Representations , year=
Generalization through Memorization: Nearest Neighbor Language Models , author=. International Conference on Learning Representations , year=
-
[18]
Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering
Izacard, Gautier and Grave, Edouard , keywords =. Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering , publisher =. 2020 , copyright =. doi:10.48550/ARXIV.2007.01282 , url =
work page internal anchor Pith review doi:10.48550/arxiv.2007.01282 2020
-
[19]
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
Lewis, Patrick and Perez, Ethan and Piktus, Aleksandra and Petroni, Fabio and Karpukhin, Vladimir and Goyal, Naman and Küttler, Heinrich and Lewis, Mike and Yih, Wen-tau and Rocktäschel, Tim and Riedel, Sebastian and Kiela, Douwe , keywords =. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks , publisher =. 2020 , copyright =. doi:10.48550/...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2005.11401 2020
-
[20]
arXiv preprint arXiv:2211.12561 , year=
Retrieval-Augmented Multimodal Language Modeling , author=. arXiv preprint arXiv:2211.12561 , year=
-
[21]
Improving language models by retrieving from trillions of tokens
Improving language models by retrieving from trillions of tokens , author=. arXiv preprint arXiv:2112.04426 , year=
work page internal anchor Pith review arXiv
-
[22]
Calibrate Before Use: Improving Few-Shot Performance of Language Models , author=
-
[23]
Language models are unsupervised multitask learners , author=. OpenAI blog , volume=
-
[24]
Tom B. Brown and Benjamin Mann and Nick Ryder and Melanie Subbiah and Jared Kaplan and Prafulla Dhariwal and Arvind Neelakantan and Pranav Shyam and Girish Sastry and Amanda Askell and Sandhini Agarwal and Ariel Herbert. Language Models are Few-Shot Learners , journal =. 2020 , url =. 2005.14165 , timestamp =
work page internal anchor Pith review Pith/arXiv arXiv 2020
-
[25]
Adapting Language Models for Zero-shot Learning by Meta-tuning on Dataset and Prompt Collections
Zhong, Ruiqi and Lee, Kristy and Zhang, Zheng and Klein, Dan. Adapting Language Models for Zero-shot Learning by Meta-tuning on Dataset and Prompt Collections. Findings of the Association for Computational Linguistics: EMNLP 2021. 2021. doi:10.18653/v1/2021.findings-emnlp.244
-
[26]
Efficient Nearest Neighbor Language Models
He, Junxian and Neubig, Graham and Berg-Kirkpatrick, Taylor. Efficient Nearest Neighbor Language Models. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 2021. doi:10.18653/v1/2021.emnlp-main.461
-
[27]
International Conference on Learning Representations , year=
Finetuned Language Models are Zero-Shot Learners , author=. International Conference on Learning Representations , year=
-
[28]
Proceedings of the 2013 conference on empirical methods in natural language processing , pages=
Recursive deep models for semantic compositionality over a sentiment treebank , author=. Proceedings of the 2013 conference on empirical methods in natural language processing , pages=
work page 2013
-
[29]
arXiv preprint arXiv:2110.15943 , year=
Metaicl: Learning to learn in context , author=. arXiv preprint arXiv:2110.15943 , year=
-
[30]
arXiv preprint arXiv:2101.06804 , year=
What Makes Good In-Context Examples for GPT- 3 ? , author=. arXiv preprint arXiv:2101.06804 , year=
-
[31]
Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity , author=. arXiv preprint arXiv:2104.08786 , year=
-
[32]
arXiv preprint arXiv:2112.08633 , year=
Learning To Retrieve Prompts for In-Context Learning , author=. arXiv preprint arXiv:2112.08633 , year=
-
[33]
Noisy Channel Language Model Prompting for Few-Shot Text Classification , author=. arXiv preprint , year=
-
[34]
Scaling Language Models: Methods, Analysis & Insights from Training Gopher
Scaling language models: Methods, analysis & insights from training gopher , author=. arXiv preprint arXiv:2112.11446 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[35]
International Conference on Machine Learning , pages=
Retrieval augmented language model pre-training , author=. International Conference on Machine Learning , pages=. 2020 , organization=
work page 2020
-
[36]
arXiv preprint arXiv:2201.12431 , year=
Neuro-Symbolic Language Modeling with Automaton-augmented Retrieval , author=. arXiv preprint arXiv:2201.12431 , year=
-
[37]
Recognizing textual entailment: Rational, evaluation and approaches – Erratum , volume=
Dagan, Ido and Dolan, Bill and Magnini, Bernardo and Roth, Dan , year=. Recognizing textual entailment: Rational, evaluation and approaches – Erratum , volume=. Natural Language Engineering , publisher=. doi:10.1017/S1351324909990234 , number=
-
[38]
Advances in neural information processing systems , volume=
Language models are few-shot learners , author=. Advances in neural information processing systems , volume=
-
[39]
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing , pages=
Efficient Nearest Neighbor Language Models , author=. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing , pages=
work page 2021
-
[40]
Xin Zheng and Zhirui Zhang and Junliang Guo and Shujian Huang and Boxing Chen and Weihua Luo and Jiajun Chen , title=. ACL/IJCNLP (2) , crossref=. 2021 , cdate=
work page 2021
-
[41]
International Conference on Learning Representations , year=
Nearest Neighbor Machine Translation , author=. International Conference on Learning Representations , year=
-
[42]
Patrick S. H. Lewis and Ethan Perez and Aleksandra Piktus and Fabio Petroni and Vladimir Karpukhin and Naman Goyal and Heinrich Küttler and Mike Lewis and Wen-tau Yih and Tim Rocktäschel and Sebastian Riedel and Douwe Kiela , title=. NeurIPS , crossref=. 2020 , cdate=
work page 2020
-
[43]
proceedings of the 25th international conference on world wide web , pages=
Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering , author=. proceedings of the 25th international conference on world wide web , pages=
-
[44]
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
Bloom: A 176b-parameter open-access multilingual language model , author=. arXiv preprint arXiv:2211.05100 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[45]
Pointer Sentinel Mixture Models
Pointer sentinel mixture models , author=. arXiv preprint arXiv:1609.07843 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[46]
Surface Form Competition: Why the Highest Probability Answer Isn ' t Always Right
Holtzman, Ari and West, Peter and Shwartz, Vered and Choi, Yejin and Zettlemoyer, Luke. Surface Form Competition: Why the Highest Probability Answer Isn ' t Always Right. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 2021. doi:10.18653/v1/2021.emnlp-main.564
-
[47]
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing , year=
Nearest Neighbor Zero-Shot Inference , author=. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing , year=
work page 2022
-
[48]
Hu, Minqing and Liu, Bing , title =. 2004 , isbn =. doi:10.1145/1014052.1014073 , booktitle =
-
[49]
Advances in neural information processing systems , volume=
Character-level convolutional networks for text classification , author=. Advances in neural information processing systems , volume=
-
[50]
Thirty-first AAAI conference on artificial intelligence , year=
Conceptnet 5.5: An open multilingual graph of general knowledge , author=. Thirty-first AAAI conference on artificial intelligence , year=
-
[51]
proceedings of Sinn und Bedeutung , volume=
The commitmentbank: Investigating projection in naturally occurring discourse , author=. proceedings of Sinn und Bedeutung , volume=
-
[52]
arXiv preprint arXiv:2108.02035 , year=
Knowledgeable prompt-tuning: Incorporating knowledge into prompt verbalizer for text classification , author=. arXiv preprint arXiv:2108.02035 , year=
-
[53]
S im CSE : Simple Contrastive Learning of Sentence Embeddings
Gao, Tianyu and Yao, Xingcheng and Chen, Danqi. S im CSE : Simple Contrastive Learning of Sentence Embeddings. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 2021. doi:10.18653/v1/2021.emnlp-main.552
-
[54]
BEIR: A Heterogeneous Benchmark for Zero-shot Evaluation of Information Retrieval Models , author=. Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2) , year=
-
[55]
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback , author=. arXiv preprint arXiv:2203.02155 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[56]
MS MARCO: A Human Generated MAchine Reading COmprehension Dataset
Ms marco: A human generated machine reading comprehension dataset , author=. arXiv preprint arXiv:1611.09268 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[57]
KILT : a Benchmark for Knowledge Intensive Language Tasks
Petroni, Fabio and Piktus, Aleksandra and Fan, Angela and Lewis, Patrick and Yazdani, Majid and De Cao, Nicola and Thorne, James and Jernite, Yacine and Karpukhin, Vladimir and Maillard, Jean and Plachouras, Vassilis and Rockt. KILT : a Benchmark for Knowledge Intensive Language Tasks. Proceedings of the 2021 Conference of the North American Chapter of th...
-
[58]
Findings of the Association for Computational Linguistics: ACL 2022 , pages=
Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models , author=. Findings of the Association for Computational Linguistics: ACL 2022 , pages=
work page 2022
-
[59]
S ent E val: An Evaluation Toolkit for Universal Sentence Representations
Conneau, Alexis and Kiela, Douwe. S ent E val: An Evaluation Toolkit for Universal Sentence Representations. Proceedings of the Eleventh International Conference on Language Resources and Evaluation ( LREC 2018). 2018
work page 2018
-
[60]
V -Measure: A Conditional Entropy-Based External Cluster Evaluation Measure
Rosenberg, Andrew and Hirschberg, Julia. V -Measure: A Conditional Entropy-Based External Cluster Evaluation Measure. Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning ( EMNLP - C o NLL ). 2007
work page 2007
-
[61]
Efficient Natural Language Response Suggestion for Smart Reply
Efficient natural language response suggestion for smart reply , author=. arXiv preprint arXiv:1705.00652 , year=
work page internal anchor Pith review Pith/arXiv arXiv
- [62]
-
[63]
M eta ICL : Learning to learn in context
Min, Sewon and Lewis, Mike and Zettlemoyer, Luke and Hajishirzi, Hannaneh. M eta ICL : Learning to Learn In Context. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022. doi:10.18653/v1/2022.naacl-main.201
-
[64]
MTEB: Massive Text Embedding Benchmark
MTEB: Massive Text Embedding Benchmark , author=. arXiv preprint arXiv:2210.07316 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[65]
Transactions on Machine Learning Research , year=
Unsupervised Dense Information Retrieval with Contrastive Learning , author=. Transactions on Machine Learning Research , year=
-
[66]
Semantic clustering and convolutional neural network for short text categorization , author=. Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers) , pages=
-
[67]
IEEE Transactions on Big Data , volume=
Billion-scale similarity search with gpus , author=. IEEE Transactions on Big Data , volume=. 2019 , publisher=
work page 2019
- [68]
-
[69]
International Conference on Machine Learning , pages=
Retrieval-augmented reinforcement learning , author=. International Conference on Machine Learning , pages=. 2022 , organization=
work page 2022
-
[70]
Learning To Retrieve Prompts for In-Context Learning , author=. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , pages=
work page 2022
-
[71]
Demonstrate-Search-Predict: Composing retrieval and language models for knowledge-intensive NLP , author=. arXiv preprint arXiv:2212.14024 , year=
-
[72]
When Not to Trust Language Models: Investigating Effectiveness and Limitations of Parametric and Non-Parametric Memories , author=. arXiv preprint arXiv:2212.10511 , year=
-
[73]
Promptcap: Prompt-guided task- aware image captioning
PromptCap: Prompt-Guided Task-Aware Image Captioning , author=. arXiv preprint arXiv:2211.09699 , year=
-
[74]
Prompting GPT-3 To Be Reliable , author=. Proc. of ICLR , year=
-
[75]
Generate rather than retrieve: Large language models are strong context generators , author=. Proc. of ICLR , year=
- [76]
- [77]
-
[78]
Hint-Based Training for Non-Autoregressive Machine Translation , author=. Proc.\ of EMNLP , year=
-
[79]
Fast Structured Decoding for Sequence Models , author=. Proc. of NeurIPS , year=
-
[80]
Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement , author=. Proc. of EMNLP , year=
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.