citation dossier

Realm: Retrieval-augmented language model pre-training

Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Ming - Wei Chang · 2002 · arXiv 2002.08909

17Pith papers citing it

17reference links

cs.CLtop field · 13 papers

UNVERDICTEDtop verdict bucket · 13 papers

This arXiv-backed work is queued for full Pith review when it crosses the high-inbound sweep. That review runs reader · skeptic · desk-editor · referee · rebuttal · circularity · lean confirmation · RS check · pith extraction.

read on arXiv PDF

why this work matters in Pith

Pith has found this work in 17 reviewed papers. Its strongest current cluster is cs.CL (13 papers). The largest review-status bucket among citing papers is UNVERDICTED (13 papers). For highly cited works, this page shows a dossier first and a bounded explorer second; it never tries to render every citing paper at once.

representative citing papers

DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines

cs.CL · 2023-10-05 · conditional · novelty 8.0

DSPy compiles short declarative programs into LM pipelines that self-optimize and outperform both standard few-shot prompting and expert-written chains on math, retrieval, and QA tasks.

Language Models are Few-Shot Learners

cs.CL · 2020-05-28 · accept · novelty 8.0

GPT-3 shows that scaling an autoregressive language model to 175 billion parameters enables strong few-shot performance across diverse NLP tasks via in-context prompting without fine-tuning.

Bridging the Long-Tail Gap: Robust Retrieval-Augmented Relation Completion via Multi-Stage Paraphrase Infusion

cs.CL · 2026-04-24 · unverdicted · novelty 7.0

RC-RAG boosts long-tail relation completion by infusing paraphrases into RAG stages, yielding up to 40.6 EM gains on benchmarks across five LLMs with no fine-tuning.

Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity

cs.LG · 2021-01-11 · accept · novelty 7.0

Switch Transformers use top-1 expert routing in a Mixture of Experts setup to scale to trillion-parameter language models with constant compute and up to 4x speedup over T5-XXL.

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

cs.CL · 2020-05-22 · accept · novelty 7.0

RAG models set new state-of-the-art results on open-domain QA by retrieving Wikipedia passages and conditioning a generative model on them, while also producing more factual text than parametric baselines.

Procedural Knowledge at Scale Improves Reasoning

cs.CL · 2026-04-01 · unverdicted · novelty 6.0

Reasoning Memory decomposes reasoning trajectories into 32 million subquestion-subroutine pairs and retrieves them via in-thought prompts to improve language model performance on math, science, and coding benchmarks by up to 19.2%.

Improving Factuality and Reasoning in Language Models through Multiagent Debate

cs.CL · 2023-05-23 · unverdicted · novelty 6.0

Multiagent debate among LLMs improves mathematical reasoning, strategic reasoning, and factual accuracy while reducing hallucinations.

Language Models (Mostly) Know What They Know

cs.CL · 2022-07-11 · unverdicted · novelty 6.0

Language models show good calibration when asked to estimate the probability that their own answers are correct, with performance improving as models get larger.

Emergent Abilities of Large Language Models

cs.CL · 2022-06-15 · unverdicted · novelty 6.0

Emergent abilities are capabilities present in large language models but absent in smaller ones and cannot be predicted by extrapolating smaller model performance.

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

cs.CL · 2022-04-12 · unverdicted · novelty 6.0

RLHF alignment training on language models boosts NLP performance, supports skill specialization, enables weekly online updates with fresh human data, and shows a linear relation between RL reward and sqrt(KL divergence from initialization.

ST-MoE: Designing Stable and Transferable Sparse Expert Models

cs.CL · 2022-02-17 · unverdicted · novelty 6.0

ST-MoE introduces stability techniques for sparse expert models, allowing a 269B-parameter model to achieve state-of-the-art transfer learning results across reasoning, summarization, and QA tasks at the compute cost of a 32B dense model.

LaMDA: Language Models for Dialog Applications

cs.CL · 2022-01-20 · unverdicted · novelty 6.0

LaMDA shows that fine-tuning on human-value annotations and consulting external knowledge sources significantly improves safety and factual grounding in large dialog models beyond what scaling alone achieves.

Unsupervised Dense Information Retrieval with Contrastive Learning

cs.IR · 2021-12-16 · unverdicted · novelty 6.0

Contrastive learning trains unsupervised dense retrievers that beat BM25 on most BEIR datasets and support cross-lingual retrieval across scripts.

Securing the Agent: Vendor-Neutral, Multitenant Enterprise Retrieval and Tool Use

cs.CR · 2026-05-06 · unverdicted · novelty 5.0

A server-side architecture with policy-aware ingestion and ABAC-based retrieval gating prevents cross-tenant data leakage in multitenant enterprise RAG and agent systems.

Reducing Redundancy in Retrieval-Augmented Generation through Chunk Filtering

cs.CL · 2026-04-27 · unverdicted · novelty 4.0

Entity-based chunk filtering reduces RAG vector index size by 25-36% with retrieval quality near baseline levels.

KnowPilot: Your Knowledge-Driven Copilot for Domain Tasks

cs.SE · 2026-04-19 · unverdicted · novelty 4.0

KnowPilot integrates knowledge retrieval and memory systems into generative agents to achieve better results on domain-specific tasks such as text generation.

A Systematic Study of Retrieval Pipeline Design for Retrieval-Augmented Medical Question Answering

cs.CL · 2026-04-08 · unverdicted · novelty 4.0

Dense retrieval plus query reformulation and reranking reaches 60.49% accuracy on MedQA USMLE, outperforming other setups while domain-specialized models make better use of the retrieved evidence.

citing papers explorer

Showing 17 of 17 citing papers.

DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines cs.CL · 2023-10-05 · conditional · none · ref 19
DSPy compiles short declarative programs into LM pipelines that self-optimize and outperform both standard few-shot prompting and expert-written chains on math, retrieval, and QA tasks.
Language Models are Few-Shot Learners cs.CL · 2020-05-28 · accept · none · ref 16
GPT-3 shows that scaling an autoregressive language model to 175 billion parameters enables strong few-shot performance across diverse NLP tasks via in-context prompting without fine-tuning.
Bridging the Long-Tail Gap: Robust Retrieval-Augmented Relation Completion via Multi-Stage Paraphrase Infusion cs.CL · 2026-04-24 · unverdicted · none · ref 2
RC-RAG boosts long-tail relation completion by infusing paraphrases into RAG stages, yielding up to 40.6 EM gains on benchmarks across five LLMs with no fine-tuning.
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity cs.LG · 2021-01-11 · accept · none · ref 12
Switch Transformers use top-1 expert routing in a Mixture of Experts setup to scale to trillion-parameter language models with constant compute and up to 4x speedup over T5-XXL.
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks cs.CL · 2020-05-22 · accept · none · ref 23
RAG models set new state-of-the-art results on open-domain QA by retrieving Wikipedia passages and conditioning a generative model on them, while also producing more factual text than parametric baselines.
Procedural Knowledge at Scale Improves Reasoning cs.CL · 2026-04-01 · unverdicted · none · ref 8
Reasoning Memory decomposes reasoning trajectories into 32 million subquestion-subroutine pairs and retrieves them via in-thought prompts to improve language model performance on math, science, and coding benchmarks by up to 19.2%.
Improving Factuality and Reasoning in Language Models through Multiagent Debate cs.CL · 2023-05-23 · unverdicted · none · ref 7
Multiagent debate among LLMs improves mathematical reasoning, strategic reasoning, and factual accuracy while reducing hallucinations.
Language Models (Mostly) Know What They Know cs.CL · 2022-07-11 · unverdicted · none · ref 50
Language models show good calibration when asked to estimate the probability that their own answers are correct, with performance improving as models get larger.
Emergent Abilities of Large Language Models cs.CL · 2022-06-15 · unverdicted · none · ref 31
Emergent abilities are capabilities present in large language models but absent in smaller ones and cannot be predicted by extrapolating smaller model performance.
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback cs.CL · 2022-04-12 · unverdicted · none · ref 7
RLHF alignment training on language models boosts NLP performance, supports skill specialization, enables weekly online updates with fresh human data, and shows a linear relation between RL reward and sqrt(KL divergence from initialization.
ST-MoE: Designing Stable and Transferable Sparse Expert Models cs.CL · 2022-02-17 · unverdicted · none · ref 75
ST-MoE introduces stability techniques for sparse expert models, allowing a 269B-parameter model to achieve state-of-the-art transfer learning results across reasoning, summarization, and QA tasks at the compute cost of a 32B dense model.
LaMDA: Language Models for Dialog Applications cs.CL · 2022-01-20 · unverdicted · none · ref 36
LaMDA shows that fine-tuning on human-value annotations and consulting external knowledge sources significantly improves safety and factual grounding in large dialog models beyond what scaling alone achieves.
Unsupervised Dense Information Retrieval with Contrastive Learning cs.IR · 2021-12-16 · unverdicted · none · ref 134
Contrastive learning trains unsupervised dense retrievers that beat BM25 on most BEIR datasets and support cross-lingual retrieval across scripts.
Securing the Agent: Vendor-Neutral, Multitenant Enterprise Retrieval and Tool Use cs.CR · 2026-05-06 · unverdicted · none · ref 10
A server-side architecture with policy-aware ingestion and ABAC-based retrieval gating prevents cross-tenant data leakage in multitenant enterprise RAG and agent systems.
Reducing Redundancy in Retrieval-Augmented Generation through Chunk Filtering cs.CL · 2026-04-27 · unverdicted · none · ref 15
Entity-based chunk filtering reduces RAG vector index size by 25-36% with retrieval quality near baseline levels.
KnowPilot: Your Knowledge-Driven Copilot for Domain Tasks cs.SE · 2026-04-19 · unverdicted · none · ref 6
KnowPilot integrates knowledge retrieval and memory systems into generative agents to achieve better results on domain-specific tasks such as text generation.
A Systematic Study of Retrieval Pipeline Design for Retrieval-Augmented Medical Question Answering cs.CL · 2026-04-08 · unverdicted · none · ref 25
Dense retrieval plus query reformulation and reranking reaches 60.49% accuracy on MedQA USMLE, outperforming other setups while domain-specialized models make better use of the retrieved evidence.

Realm: Retrieval-augmented language model pre-training

why this work matters in Pith

fields

years

verdicts

representative citing papers

citing papers explorer