REALM augments language-model pre-training with an unsupervised retriever over Wikipedia documents and reports 4-16% absolute gains on open-domain QA benchmarks over prior implicit and explicit knowledge methods.
hub
Gener- alization through memorization: Nearest neighbor language models.ArXiv, abs/1911.00172
27 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
roles
background 3polarities
background 3representative citing papers
AR-VLA introduces a standalone autoregressive action expert with long-lived memory that generates context-aware continuous actions for VLAs, replacing chunk-based heads with smoother trajectories and maintained task success.
NeuroKalman mitigates state drift in vision-language UAV navigation by using memory-augmented Kalman filtering where attention retrieves historical anchors to correct predictions without gradient updates.
LLaMA-Adapter turns frozen LLaMA 7B into a capable instruction follower using only 1.2M new parameters and zero-init attention, matching Alpaca while extending to image-conditioned reasoning on ScienceQA and COCO.
Intra-layer model parallelism in PyTorch enables training of 8.3B-parameter transformers, achieving SOTA perplexity of 10.8 on WikiText103 and 66.5% accuracy on LAMBADA.
HOLA pairs a compressive delta-rule recurrent state with a residual-selected exact KV cache and decoupled RMSNorm-gamma read, yielding lower perplexity than both standard linear attention and full-attention baselines on Wikitext and LAMBADA plus stronger needle-in-haystack recall.
InduceKV is a retrieval-based continual adaptation method that uses bilevel selection to build a compact set of inducing KV memories for fixed-footprint updates to multimodal LLMs.
A hybrid attention mechanism with editable request-local memory slots and sparse fallback achieves high accuracy on synthetic overwrite, version, and anti-pollution tasks where pure fixed-state or sparse methods fail, while identifying open-domain selection as the remaining bottleneck.
Tensor Memory augments Transformers with a constant-size 3D voxel grid using differentiable soft writes at predicted locations, local interaction, and gated recurrent dynamics to decouple memory capacity from sequence length.
FedRAG uses a Scrambled Distributed Attention protocol with feature scrambling and token permutation to enable high-throughput, privacy-preserving federated RAG without special hardware or retraining.
SegRAG is a training-free retrieval-augmented framework that extracts class-specific point prompts from a filtered DINOv3 feature bank to boost SAM3 semantic segmentation performance on standard and agricultural benchmarks.
Memory Inception is a training-free method that injects latent KV banks at chosen layers to steer LLMs, achieving superior control-drift balance and up to 118x storage reduction on personality and structured-reasoning tasks.
A context-aware Sentinel-Strategist system for RAG selectively applies defenses to block membership inference and data poisoning while recovering most retrieval utility compared to always-on defense stacks.
Loss-based pruning of training data to limit facts and flatten their frequency distribution enables a 110M-parameter GPT-2 model to memorize 1.3 times more entity facts than standard training, matching a 1.3B-parameter model on the full dataset.
AtlasKV integrates billion-scale KGs into LLMs parametrically with sub-linear complexity and low memory by converting triples into key-value representations handled by the model's attention.
RAPTOR introduces a tree-organized retrieval method using recursive abstractive summaries, achieving a 20% absolute accuracy improvement on the QuALITY benchmark when paired with GPT-4.
MetaCLIP curates balanced 400M-pair subsets from CommonCrawl that outperform CLIP data, reaching 70.8% zero-shot ImageNet accuracy on ViT-B versus CLIP's 68.3%.
LaMDA shows that fine-tuning on human-value annotations and consulting external knowledge sources significantly improves safety and factual grounding in large dialog models beyond what scaling alone achieves.
MemSlides introduces a three-part memory hierarchy (user profile, working, tool) with scoped local revision for multi-turn personalized slide generation.
CFALR augments LLMs with collaborative filtering embeddings via trainable projection layers to outperform prior CF and LLM methods on Polyvore and IQON for personalized outfit tasks.
An audit finds language model filters and guardrails disproportionately suppress mentions of marginalized groups via lexical cues while failing to catch explicit harms.
NGM is a plug-and-play n-gram memory module that encodes n-grams from pretrained embeddings and gates their injection to improve LLM performance by 0.5-1.2 points on average across eight benchmarks.
TIDE augments standard transformers with per-layer token embedding injection via an ensemble of memory blocks and a depth-conditioned router to mitigate rare-token undertraining and contextual collapse.
FAAST performs test-time supervised adaptation by analytically deriving fast weights from examples in one forward pass, matching backprop performance with over 90% less adaptation time and up to 95% memory savings versus memory-based methods.
citing papers explorer
-
CFALR: Collaborative Filtering-Augmented Large Language Model for Personalized Fashion Outfit Recommendation
CFALR augments LLMs with collaborative filtering embeddings via trainable projection layers to outperform prior CF and LLM methods on Polyvore and IQON for personalized outfit tasks.