hub

Improving language models by retrieving from trillions of tokens

Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Rutherford, Katie Millican · 2021 · cs.CL · arXiv 2112.04426

29 Pith papers cite this work. Polarity classification is still indexing.

29 Pith papers citing it

open full Pith review browse 29 citing papers arXiv PDF

abstract

We enhance auto-regressive language models by conditioning on document chunks retrieved from a large corpus, based on local similarity with preceding tokens. With a $2$ trillion token database, our Retrieval-Enhanced Transformer (RETRO) obtains comparable performance to GPT-3 and Jurassic-1 on the Pile, despite using 25$\times$ fewer parameters. After fine-tuning, RETRO performance translates to downstream knowledge-intensive tasks such as question answering. RETRO combines a frozen Bert retriever, a differentiable encoder and a chunked cross-attention mechanism to predict tokens based on an order of magnitude more data than what is typically consumed during training. We typically train RETRO from scratch, yet can also rapidly RETROfit pre-trained transformers with retrieval and still achieve good performance. Our work opens up new avenues for improving language models through explicit memory at unprecedented scale.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 4

citation-polarity summary

background 4

representative citing papers

A Unified Model and Document Representation for On-Device Retrieval-Augmented Generation

cs.IR · 2026-04-15 · unverdicted · novelty 7.0

A single model unifies retrieval and context compression for on-device RAG via shared representations, matching traditional RAG performance at 1/10 context size with no extra storage.

A Generalist Agent

cs.AI · 2022-05-12 · accept · novelty 7.0

Gato is a multi-modal, multi-task, multi-embodiment generalist policy using one transformer network to handle text, vision, games, and robotics tasks.

OPT: Open Pre-trained Transformer Language Models

cs.CL · 2022-05-02 · unverdicted · novelty 7.0

OPT releases open decoder-only transformers up to 175B parameters that match GPT-3 performance at one-seventh the carbon cost, along with code and training logs.

A Hippocampus for Linear Attention: An Exact Memory for What the Recurrent State Forgets

cs.AI · 2026-07-02 · unverdicted · novelty 6.0

HOLA pairs a compressive delta-rule recurrent state with a residual-selected exact KV cache and decoupled RMSNorm-gamma read, yielding lower perplexity than both standard linear attention and full-attention baselines on Wikitext and LAMBADA plus stronger needle-in-haystack recall.

Memory-Managed Long-Context Attention: A Preliminary Study of Editable Request-Local Memory

cs.CL · 2026-06-27 · unverdicted · novelty 6.0

A hybrid attention mechanism with editable request-local memory slots and sparse fallback achieves high accuracy on synthetic overwrite, version, and anti-pollution tasks where pure fixed-state or sparse methods fail, while identifying open-domain selection as the remaining bottleneck.

Substrate Asymmetry in User-Side Memory: A Diagnostic Framework

cs.CL · 2026-06-10 · unverdicted · novelty 6.0

User memory in LLMs factors into three orthogonal axes where parametric adapters and retrieval show opposite strengths, with causal evidence from attention interventions and an alignment tax on RLHF models.

Grounded Decoding: Retrieval-Anchored Probability Fusion for Faithful RAG

cs.LG · 2026-05-29 · unverdicted · novelty 6.0

Grounded Decoding fuses full-RAG and retrieval-only next-token distributions via normalized geometric mean from a KL-barycenter to improve factual consistency and citation quality in RAG.

Tensor Memory: Fixed-Size Recurrent State for Long-Horizon Transformers

cs.CV · 2026-05-26 · unverdicted · novelty 6.0

Tensor Memory augments Transformers with a constant-size 3D voxel grid using differentiable soft writes at predicted locations, local interaction, and gated recurrent dynamics to decouple memory capacity from sequence length.

Search-E1: Self-Distillation Drives Self-Evolution in Search-Augmented Reasoning

cs.AI · 2026-05-21 · unverdicted · novelty 6.0 · 2 refs

Search-E1 uses GRPO interleaved with on-policy self-distillation to reach 0.440 average EM on seven QA benchmarks with Qwen2.5-3B, outperforming open-source baselines.

SD-Search: On-Policy Hindsight Self-Distillation for Search-Augmented Reasoning

cs.AI · 2026-05-18 · unverdicted · novelty 6.0

SD-Search derives step-level supervision for search queries in reasoning agents via on-policy hindsight self-distillation using the policy as both student and teacher.

When AI reviews science: Can we trust the referee?

cs.AI · 2026-04-26 · unverdicted · novelty 6.0

AI peer review systems are vulnerable to prompt injections, prestige biases, assertion strength effects, and contextual poisoning, as demonstrated by a new attack taxonomy and causal experiments on real conference submissions.

RAG-GNN: Integrating Retrieved Knowledge with Graph Neural Networks for Precision Medicine

q-bio.MN · 2026-01-31 · unverdicted · novelty 6.0

RAG-GNN augments GNNs with retrieved literature knowledge via gated fusion to improve functional clustering of 379 proteins in cancer signaling networks, raising silhouette score by 0.093.

RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval

cs.CL · 2024-01-31 · unverdicted · novelty 6.0

RAPTOR introduces a tree-organized retrieval method using recursive abstractive summaries, achieving a 20% absolute accuracy improvement on the QuALITY benchmark when paired with GPT-4.

Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes

cs.CL · 2023-05-03 · conditional · novelty 6.0

Distilling step-by-step uses LLM-generated rationales as additional supervision in a multi-task framework so that 770M-parameter models outperform 540B-parameter models on NLP benchmarks with only 80% of the data.

REPLUG: Retrieval-Augmented Black-Box Language Models

cs.CL · 2023-01-30 · conditional · novelty 6.0

REPLUG improves frozen black-box LMs by prepending LM-supervised retrieved documents, delivering 6.3% better language modeling on GPT-3 and 5.1% better five-shot MMLU on Codex.

Atlas: Few-shot Learning with Retrieval Augmented Language Models

cs.CL · 2022-08-05 · unverdicted · novelty 6.0 · 2 refs

Atlas reaches over 42% accuracy on Natural Questions with only 64 examples, outperforming a 540B-parameter model by 3% with 50x fewer parameters.

Language Models (Mostly) Know What They Know

cs.CL · 2022-07-11 · unverdicted · novelty 6.0

Language models show good calibration when asked to estimate the probability that their own answers are correct, with performance improving as models get larger.

Emergent Abilities of Large Language Models

cs.CL · 2022-06-15 · unverdicted · novelty 6.0

Emergent abilities are capabilities present in large language models but absent in smaller ones and cannot be predicted by extrapolating smaller model performance.

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

cs.CL · 2022-04-12 · unverdicted · novelty 6.0

RLHF alignment training on language models boosts NLP performance, supports skill specialization, enables weekly online updates with fresh human data, and shows a linear relation between RL reward and sqrt(KL divergence from initialization.

PaLM: Scaling Language Modeling with Pathways

cs.CL · 2022-04-05 · accept · novelty 6.0

PaLM 540B demonstrates continued scaling benefits by setting new few-shot SOTA results on hundreds of benchmarks and outperforming humans on BIG-bench.

LaMDA: Language Models for Dialog Applications

cs.CL · 2022-01-20 · unverdicted · novelty 6.0

LaMDA shows that fine-tuning on human-value annotations and consulting external knowledge sources significantly improves safety and factual grounding in large dialog models beyond what scaling alone achieves.

PaperClaw: Harnessing Agents for Autonomous Research and Human-in-the-Loop Refinement

cs.AI · 2026-06-21 · unverdicted · novelty 5.0

PAPERCLAW is a multi-agent system for end-to-end autonomous research paper generation from literature to output, with human refinement and LLM-judge evaluation showing strong results.

MemSlides: A Hierarchical Memory Driven Agent Framework for Personalized Slide Generation with Multi-turn Local Revision

cs.CL · 2026-06-15 · unverdicted · novelty 5.0

MemSlides introduces a three-part memory hierarchy (user profile, working, tool) with scoped local revision for multi-turn personalized slide generation.

Small Language Models are the Future of Agentic AI

cs.AI · 2025-06-02 · unverdicted · novelty 5.0

Small language models are sufficiently capable, more suitable, and far more economical than large models for the repetitive tasks that dominate agentic AI systems.

citing papers explorer

Showing 29 of 29 citing papers.

A Unified Model and Document Representation for On-Device Retrieval-Augmented Generation cs.IR · 2026-04-15 · unverdicted · none · ref 4 · internal anchor
A single model unifies retrieval and context compression for on-device RAG via shared representations, matching traditional RAG performance at 1/10 context size with no extra storage.
A Generalist Agent cs.AI · 2022-05-12 · accept · none · ref 11 · internal anchor
Gato is a multi-modal, multi-task, multi-embodiment generalist policy using one transformer network to handle text, vision, games, and robotics tasks.
OPT: Open Pre-trained Transformer Language Models cs.CL · 2022-05-02 · unverdicted · none · ref 272 · internal anchor
OPT releases open decoder-only transformers up to 175B parameters that match GPT-3 performance at one-seventh the carbon cost, along with code and training logs.
A Hippocampus for Linear Attention: An Exact Memory for What the Recurrent State Forgets cs.AI · 2026-07-02 · unverdicted · none · ref 3 · internal anchor
HOLA pairs a compressive delta-rule recurrent state with a residual-selected exact KV cache and decoupled RMSNorm-gamma read, yielding lower perplexity than both standard linear attention and full-attention baselines on Wikitext and LAMBADA plus stronger needle-in-haystack recall.
Memory-Managed Long-Context Attention: A Preliminary Study of Editable Request-Local Memory cs.CL · 2026-06-27 · unverdicted · none · ref 33 · internal anchor
A hybrid attention mechanism with editable request-local memory slots and sparse fallback achieves high accuracy on synthetic overwrite, version, and anti-pollution tasks where pure fixed-state or sparse methods fail, while identifying open-domain selection as the remaining bottleneck.
Substrate Asymmetry in User-Side Memory: A Diagnostic Framework cs.CL · 2026-06-10 · unverdicted · none · ref 41 · internal anchor
User memory in LLMs factors into three orthogonal axes where parametric adapters and retrieval show opposite strengths, with causal evidence from attention interventions and an alignment tax on RLHF models.
Grounded Decoding: Retrieval-Anchored Probability Fusion for Faithful RAG cs.LG · 2026-05-29 · unverdicted · none · ref 31 · internal anchor
Grounded Decoding fuses full-RAG and retrieval-only next-token distributions via normalized geometric mean from a KL-barycenter to improve factual consistency and citation quality in RAG.
Tensor Memory: Fixed-Size Recurrent State for Long-Horizon Transformers cs.CV · 2026-05-26 · unverdicted · none · ref 4 · internal anchor
Tensor Memory augments Transformers with a constant-size 3D voxel grid using differentiable soft writes at predicted locations, local interaction, and gated recurrent dynamics to decouple memory capacity from sequence length.
Search-E1: Self-Distillation Drives Self-Evolution in Search-Augmented Reasoning cs.AI · 2026-05-21 · unverdicted · none · ref 1 · 2 links · internal anchor
Search-E1 uses GRPO interleaved with on-policy self-distillation to reach 0.440 average EM on seven QA benchmarks with Qwen2.5-3B, outperforming open-source baselines.
SD-Search: On-Policy Hindsight Self-Distillation for Search-Augmented Reasoning cs.AI · 2026-05-18 · unverdicted · none · ref 3 · internal anchor
SD-Search derives step-level supervision for search queries in reasoning agents via on-policy hindsight self-distillation using the policy as both student and teacher.
When AI reviews science: Can we trust the referee? cs.AI · 2026-04-26 · unverdicted · none · ref 100 · internal anchor
AI peer review systems are vulnerable to prompt injections, prestige biases, assertion strength effects, and contextual poisoning, as demonstrated by a new attack taxonomy and causal experiments on real conference submissions.
RAG-GNN: Integrating Retrieved Knowledge with Graph Neural Networks for Precision Medicine q-bio.MN · 2026-01-31 · unverdicted · none · ref 21 · internal anchor
RAG-GNN augments GNNs with retrieved literature knowledge via gated fusion to improve functional clustering of 379 proteins in cancer signaling networks, raising silhouette score by 0.093.
RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval cs.CL · 2024-01-31 · unverdicted · none · ref 7 · internal anchor
RAPTOR introduces a tree-organized retrieval method using recursive abstractive summaries, achieving a 20% absolute accuracy improvement on the QuALITY benchmark when paired with GPT-4.
Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes cs.CL · 2023-05-03 · conditional · none · ref 22 · internal anchor
Distilling step-by-step uses LLM-generated rationales as additional supervision in a multi-task framework so that 770M-parameter models outperform 540B-parameter models on NLP benchmarks with only 80% of the data.
REPLUG: Retrieval-Augmented Black-Box Language Models cs.CL · 2023-01-30 · conditional · none · ref 21 · internal anchor
REPLUG improves frozen black-box LMs by prepending LM-supervised retrieved documents, delivering 6.3% better language modeling on GPT-3 and 5.1% better five-shot MMLU on Codex.
Atlas: Few-shot Learning with Retrieval Augmented Language Models cs.CL · 2022-08-05 · unverdicted · none · ref 74 · 2 links · internal anchor
Atlas reaches over 42% accuracy on Natural Questions with only 64 examples, outperforming a 540B-parameter model by 3% with 50x fewer parameters.
Language Models (Mostly) Know What They Know cs.CL · 2022-07-11 · unverdicted · none · ref 30 · internal anchor
Language models show good calibration when asked to estimate the probability that their own answers are correct, with performance improving as models get larger.
Emergent Abilities of Large Language Models cs.CL · 2022-06-15 · unverdicted · none · ref 12 · internal anchor
Emergent abilities are capabilities present in large language models but absent in smaller ones and cannot be predicted by extrapolating smaller model performance.
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback cs.CL · 2022-04-12 · unverdicted · none · ref 4 · internal anchor
RLHF alignment training on language models boosts NLP performance, supports skill specialization, enables weekly online updates with fresh human data, and shows a linear relation between RL reward and sqrt(KL divergence from initialization.
PaLM: Scaling Language Modeling with Pathways cs.CL · 2022-04-05 · accept · none · ref 19 · internal anchor
PaLM 540B demonstrates continued scaling benefits by setting new few-shot SOTA results on hundreds of benchmarks and outperforming humans on BIG-bench.
LaMDA: Language Models for Dialog Applications cs.CL · 2022-01-20 · unverdicted · none · ref 41 · internal anchor
LaMDA shows that fine-tuning on human-value annotations and consulting external knowledge sources significantly improves safety and factual grounding in large dialog models beyond what scaling alone achieves.
PaperClaw: Harnessing Agents for Autonomous Research and Human-in-the-Loop Refinement cs.AI · 2026-06-21 · unverdicted · none · ref 54 · internal anchor
PAPERCLAW is a multi-agent system for end-to-end autonomous research paper generation from literature to output, with human refinement and LLM-judge evaluation showing strong results.
MemSlides: A Hierarchical Memory Driven Agent Framework for Personalized Slide Generation with Multi-turn Local Revision cs.CL · 2026-06-15 · unverdicted · none · ref 3 · internal anchor
MemSlides introduces a three-part memory hierarchy (user profile, working, tool) with scoped local revision for multi-turn personalized slide generation.
Small Language Models are the Future of Agentic AI cs.AI · 2025-06-02 · unverdicted · none · ref 10 · internal anchor
Small language models are sufficiently capable, more suitable, and far more economical than large models for the repetitive tasks that dominate agentic AI systems.
Galactica: A Large Language Model for Science cs.CL · 2022-11-16 · unverdicted · none · ref 25 · 2 links · internal anchor
Galactica, a science-specialized LLM, reports higher scores than GPT-3, Chinchilla, and PaLM on LaTeX knowledge, mathematical reasoning, and medical QA benchmarks while outperforming general models on BIG-bench.
Reducing Redundancy in Retrieval-Augmented Generation through Chunk Filtering cs.CL · 2026-04-27 · unverdicted · none · ref 4 · internal anchor
Entity-based chunk filtering reduces RAG vector index size by 25-36% with retrieval quality near baseline levels.
KnowPilot: Your Knowledge-Driven Copilot for Domain Tasks cs.SE · 2026-04-19 · unverdicted · none · ref 2 · internal anchor
KnowPilot integrates knowledge retrieval and memory systems into generative agents to achieve better results on domain-specific tasks such as text generation.
Less LLM, More Documents: Searching for Improved RAG cs.IR · 2025-10-03 · unverdicted · none · ref 2 · internal anchor
Corpus scaling in RAG frequently matches the accuracy gains from larger LLMs on open-domain QA tasks, with mid-sized models benefiting most due to better passage coverage.
Context Recycling for Long-Horizon LLM Inference cs.CL · 2026-05-01 · unverdicted · none · ref 2 · internal anchor
ContextForge recycles context in long-horizon LLM tasks via query generation, memory retrieval, and synthesis, yielding reduced token use and improved consistency on a 15-turn healthcare benchmark while preserving accuracy.

Improving language models by retrieving from trillions of tokens

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer