Generalization through memorization: Nearest neighbor language models

Generalization through memorization: Nearest neighbor language models , author= · 1911 · arXiv 1911.00172

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

read on arXiv browse 9 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention

cs.CV · 2023-03-28 · conditional · novelty 7.0

LLaMA-Adapter turns frozen LLaMA 7B into a capable instruction follower using only 1.2M new parameters and zero-init attention, matching Alpaca while extending to image-conditioned reasoning on ScienceQA and COCO.

Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism

cs.CL · 2019-09-17 · unverdicted · novelty 7.0

Intra-layer model parallelism in PyTorch enables training of 8.3B-parameter transformers, achieving SOTA perplexity of 10.8 on WikiText103 and 66.5% accuracy on LAMBADA.

Memory Inception: Latent-Space KV Cache Manipulation for Steering LLMs

cs.LG · 2026-05-07 · unverdicted · novelty 6.0 · 2 refs

Memory Inception is a training-free method that injects latent KV banks at chosen layers to steer LLMs, achieving superior control-drift balance and up to 118x storage reduction on personality and structured-reasoning tasks.

Adaptive Defense Orchestration for RAG: A Sentinel-Strategist Architecture against Multi-Vector Attacks

cs.CR · 2026-04-22 · unverdicted · novelty 6.0

A context-aware Sentinel-Strategist system for RAG selectively applies defenses to block membership inference and data poisoning while recovering most retrieval utility compared to always-on defense stacks.

Cram Less to Fit More: Training Data Pruning Improves Memorization of Facts

cs.CL · 2026-04-09 · conditional · novelty 6.0

Loss-based pruning of training data to limit facts and flatten their frequency distribution enables a 110M-parameter GPT-2 model to memorize 1.3 times more entity facts than standard training, matching a 1.3B-parameter model on the full dataset.

LaMDA: Language Models for Dialog Applications

cs.CL · 2022-01-20 · unverdicted · novelty 6.0

LaMDA shows that fine-tuning on human-value annotations and consulting external knowledge sources significantly improves safety and factual grounding in large dialog models beyond what scaling alone achieves.

TIDE: Every Layer Knows the Token Beneath the Context

cs.CL · 2026-05-07 · unverdicted · novelty 5.0

TIDE augments standard transformers with per-layer token embedding injection via an ensemble of memory blocks and a depth-conditioned router to mitigate rare-token undertraining and contextual collapse.

FAAST: Forward-Only Associative Learning via Closed-Form Fast Weights for Test-Time Supervised Adaptation

cs.LG · 2026-05-06 · unverdicted · novelty 5.0 · 2 refs

FAAST performs test-time supervised adaptation by analytically deriving fast weights from examples in one forward pass, matching backprop performance with over 90% less adaptation time and up to 95% memory savings versus memory-based methods.

Enhancing Large Language Models with Retrieval Augmented Generation for Software Testing and Inspection Automation

cs.SE · 2026-04-16 · unverdicted · novelty 3.0

RAG-enhanced LLMs show generally positive effects on automated test generation and code inspection by supplying supplementary context that reduces hallucinations.

citing papers explorer

Showing 9 of 9 citing papers.

LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention cs.CV · 2023-03-28 · conditional · none · ref 267
LLaMA-Adapter turns frozen LLaMA 7B into a capable instruction follower using only 1.2M new parameters and zero-init attention, matching Alpaca while extending to image-conditioned reasoning on ScienceQA and COCO.
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism cs.CL · 2019-09-17 · unverdicted · none · ref 13
Intra-layer model parallelism in PyTorch enables training of 8.3B-parameter transformers, achieving SOTA perplexity of 10.8 on WikiText103 and 66.5% accuracy on LAMBADA.
Memory Inception: Latent-Space KV Cache Manipulation for Steering LLMs cs.LG · 2026-05-07 · unverdicted · none · ref 16 · 2 links
Memory Inception is a training-free method that injects latent KV banks at chosen layers to steer LLMs, achieving superior control-drift balance and up to 118x storage reduction on personality and structured-reasoning tasks.
Adaptive Defense Orchestration for RAG: A Sentinel-Strategist Architecture against Multi-Vector Attacks cs.CR · 2026-04-22 · unverdicted · none · ref 21
A context-aware Sentinel-Strategist system for RAG selectively applies defenses to block membership inference and data poisoning while recovering most retrieval utility compared to always-on defense stacks.
Cram Less to Fit More: Training Data Pruning Improves Memorization of Facts cs.CL · 2026-04-09 · conditional · none · ref 45
Loss-based pruning of training data to limit facts and flatten their frequency distribution enables a 110M-parameter GPT-2 model to memorize 1.3 times more entity facts than standard training, matching a 1.3B-parameter model on the full dataset.
LaMDA: Language Models for Dialog Applications cs.CL · 2022-01-20 · unverdicted · none · ref 34
LaMDA shows that fine-tuning on human-value annotations and consulting external knowledge sources significantly improves safety and factual grounding in large dialog models beyond what scaling alone achieves.
TIDE: Every Layer Knows the Token Beneath the Context cs.CL · 2026-05-07 · unverdicted · none · ref 99
TIDE augments standard transformers with per-layer token embedding injection via an ensemble of memory blocks and a depth-conditioned router to mitigate rare-token undertraining and contextual collapse.
FAAST: Forward-Only Associative Learning via Closed-Form Fast Weights for Test-Time Supervised Adaptation cs.LG · 2026-05-06 · unverdicted · none · ref 11 · 2 links
FAAST performs test-time supervised adaptation by analytically deriving fast weights from examples in one forward pass, matching backprop performance with over 90% less adaptation time and up to 95% memory savings versus memory-based methods.
Enhancing Large Language Models with Retrieval Augmented Generation for Software Testing and Inspection Automation cs.SE · 2026-04-16 · unverdicted · none · ref 9
RAG-enhanced LLMs show generally positive effects on automated test generation and code inspection by supplying supplementary context that reduces hallucinations.

Generalization through memorization: Nearest neighbor language models

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer