{"total":21,"items":[{"citing_arxiv_id":"2605.27686","ref_index":18,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Tensor Memory: Fixed-Size Recurrent State for Long-Horizon Transformers","primary_cat":"cs.CV","submitted_at":"2026-05-26T21:03:42+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Tensor Memory augments Transformers with a constant-size 3D voxel grid using differentiable soft writes at predicted locations, local interaction, and gated recurrent dynamics to decouple memory capacity from sequence length.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.25716","ref_index":25,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"An Efficient and Privacy-Preserving Architecture for Cross-Institutional Collaborative RAG","primary_cat":"cs.CR","submitted_at":"2026-05-25T11:18:19+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"FedRAG uses a Scrambled Distributed Attention protocol with feature scrambling and token permutation to enable high-throughput, privacy-preserving federated RAG without special hardware or retraining.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.17630","ref_index":57,"ref_count":2,"confidence":0.9,"is_internal_anchor":false,"paper_title":"SegRAG: Training-Free Retrieval-Augmented Semantic Segmentation","primary_cat":"cs.CV","submitted_at":"2026-05-17T19:51:32+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"SegRAG is a training-free retrieval-augmented framework that extracts class-specific point prompts from a filtered DINOv3 feature bank to boost SAM3 semantic segmentation performance on standard and agricultural benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.16893","ref_index":24,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"NGM: A Plug-and-Play Training-Free Memory Module for LLMs","primary_cat":"cs.AI","submitted_at":"2026-05-16T09:12:52+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"NGM is a plug-and-play n-gram memory module that encodes n-grams from pretrained embeddings and gates their injection to improve LLM performance by 0.5-1.2 points on average across eight benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.06225","ref_index":16,"ref_count":2,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Memory Inception: Latent-Space KV Cache Manipulation for Steering LLMs","primary_cat":"cs.LG","submitted_at":"2026-05-07T13:19:33+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Memory Inception is a training-free method that injects latent KV banks at chosen layers to steer LLMs, achieving superior control-drift balance and up to 118x storage reduction on personality and structured-reasoning tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.06216","ref_index":99,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"TIDE: Every Layer Knows the Token Beneath the Context","primary_cat":"cs.CL","submitted_at":"2026-05-07T13:16:18+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"TIDE augments standard transformers with per-layer token embedding injection via an ensemble of memory blocks and a depth-conditioned router to mitigate rare-token undertraining and contextual collapse.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.04651","ref_index":11,"ref_count":2,"confidence":0.9,"is_internal_anchor":false,"paper_title":"FAAST: Forward-Only Associative Learning via Closed-Form Fast Weights for Test-Time Supervised Adaptation","primary_cat":"cs.LG","submitted_at":"2026-05-06T08:58:11+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"FAAST performs test-time supervised adaptation by analytically deriving fast weights from examples in one forward pass, matching backprop performance with over 90% less adaptation time and up to 95% memory savings versus memory-based methods.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"fast-weight matrixcomputed by solving the linear regres- sion problem min W ∥KW−V∥ 2 F ,(9) The optimal solution is given analytically by W ⋆ =K †V∈R dx×dy (10) where K † denotes the Moore-Penrose pseudoinverse (Pen- rose, 1955). The Moore-Penrose pseudoinverse can be solved by singular value decomposition (SVD), a spectral transform of matrix. Specifically, let the SVD ofKbe K=UΣR ⊤,(11) 4 Forward-Only Associative Learning for Test-Time Adaptation where singular values Σ = diag(σ 1, . . . , σr) with σ1 ≥ · · · ≥σ r >0 , and singular vectors U and R have orthonor- mal columns. The pseudoinverse is then given by K † =RΣ † U ⊤,Σ † = diag(σ−1 1 , . . . , σ−1 r ),(12) and the fast weights can be written as W ⋆ =RΣ † U ⊤V.(13) The computation of W ⋆ involves only asingle forward"},{"citing_arxiv_id":"2605.08143","ref_index":9,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"HoReN: Normalized Hopfield Retrieval for Large-Scale Sequential Model Editing","primary_cat":"cs.LG","submitted_at":"2026-05-02T15:51:31+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"HoReN is a parameter-preserving editor that wraps an MLP with a Hopfield codebook memory and scales to 50K sequential edits on ZsRE while maintaining performance above 0.93.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.20932","ref_index":21,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Adaptive Defense Orchestration for RAG: A Sentinel-Strategist Architecture against Multi-Vector Attacks","primary_cat":"cs.CR","submitted_at":"2026-04-22T11:17:10+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A context-aware Sentinel-Strategist system for RAG selectively applies defenses to block membership inference and data poisoning while recovering most retrieval utility compared to always-on defense stacks.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"depend solely on fixed internal parameters, RAG systems actively fetch semantically relevant context from external databases. As illustrated in Fig. 1, the pipeline consists of three phases: ingestion (document chunking and embedding), retrieval (top-k similarity search), and augmentation (prompt construction for the generator) [4]. Recent developments have progressed to include advanced techniques such as token-level retrieval methods [21], adaptive data chunking 3 APREPRINT- APRIL24, 2026 strategies [22], and graph-structured knowledge models that represent complex relational dependencies among entities [23, 24]. 2.2 Privacy & Security Risks in RAG Systems Integrating external knowledge bases significantly improves LLMs, but it also changes the threat landscape by creating additional attack surfaces."},{"citing_arxiv_id":"2604.15270","ref_index":9,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Enhancing Large Language Models with Retrieval Augmented Generation for Software Testing and Inspection Automation","primary_cat":"cs.SE","submitted_at":"2026-04-16T17:41:49+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":3.0,"formal_verification":"none","one_line_summary":"RAG-enhanced LLMs show generally positive effects on automated test generation and code inspection by supplying supplementary context that reduces hallucinations.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.08519","ref_index":45,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Cram Less to Fit More: Training Data Pruning Improves Memorization of Facts","primary_cat":"cs.CL","submitted_at":"2026-04-09T17:55:50+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Loss-based pruning of training data to limit facts and flatten their frequency distribution enables a 110M-parameter GPT-2 model to memorize 1.3 times more entity facts than standard training, matching a 1.3B-parameter model on the full dataset.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2603.10126","ref_index":18,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"AR-VLA: True Autoregressive Action Expert for Vision-Language-Action Models","primary_cat":"cs.RO","submitted_at":"2026-03-10T18:03:29+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"AR-VLA introduces a standalone autoregressive action expert with long-lived memory that generates context-aware continuous actions for VLAs, replacing chunk-based heads with smoother trajectories and maintained task success.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2602.11183","ref_index":9,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Mitigating Error Accumulation in Continuous Navigation via Memory-Augmented Kalman Filtering","primary_cat":"cs.RO","submitted_at":"2026-01-30T05:03:08+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"NeuroKalman mitigates state drift in vision-language UAV navigation by using memory-augmented Kalman filtering where attention retrieves historical anchors to correct predictions without gradient updates.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2510.26083","ref_index":18,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Nirvana: A Specialized Generalist Model With Task-Aware Memory Mechanism","primary_cat":"cs.LG","submitted_at":"2025-10-30T02:41:54+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Nirvana adds a task-aware memory trigger and updater to specialized generalist models, achieving strong general benchmark results, lowest perplexity in biomedicine/finance/law, and improved MRI reconstruction fidelity.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2510.17934","ref_index":18,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"AtlasKV: Augmenting LLMs with Billion-Scale Knowledge Graphs in 20GB VRAM","primary_cat":"cs.CL","submitted_at":"2025-10-20T15:40:14+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"AtlasKV integrates billion-scale KGs into LLMs parametrically with sub-linear complexity and low memory by converting triples into key-value representations handled by the model's attention.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2401.18059","ref_index":150,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval","primary_cat":"cs.CL","submitted_at":"2024-01-31T18:30:21+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"RAPTOR introduces a tree-organized retrieval method using recursive abstractive summaries, achieving a 20% absolute accuracy improvement on the QuALITY benchmark when paired with GPT-4.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2309.16671","ref_index":98,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Demystifying CLIP Data","primary_cat":"cs.CV","submitted_at":"2023-09-28T17:59:56+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":6.0,"formal_verification":"none","one_line_summary":"MetaCLIP curates balanced 400M-pair subsets from CommonCrawl that outperform CLIP data, reaching 70.8% zero-shot ImageNet accuracy on ViT-B versus CLIP's 68.3%.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2303.16199","ref_index":267,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention","primary_cat":"cs.CV","submitted_at":"2023-03-28T17:59:12+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"LLaMA-Adapter turns frozen LLaMA 7B into a capable instruction follower using only 1.2M new parameters and zero-init attention, matching Alpaca while extending to image-conditioned reasoning on ScienceQA and COCO.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2201.08239","ref_index":34,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"LaMDA: Language Models for Dialog Applications","primary_cat":"cs.CL","submitted_at":"2022-01-20T15:44:37+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"LaMDA shows that fine-tuning on human-value annotations and consulting external knowledge sources significantly improves safety and factual grounding in large dialog models beyond what scaling alone achieves.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"language models with retrieval systems. Most of the existing literature focuses on the problem of open-domain question-answering rather than dialog generation, and the models themselves are used to index and rank knowledge sources, rather than trained to use an intermediate tool. Given these differences, we note that the range of existing approaches to this problem include the RNNLM [34], RAG [35], REALM [ 36], and FiD [37] architectures. Zhu et al. [38] provide a survey of further recent work. See Karpukhin et al. [39] for details on the 'dense passage retriever' used in RAG. Recent work in this direction has expanded and elaborated on neural models' ability to retrieve and rank passages [40]. The RETRO architecture demonstrates that language models can be primed with results retrieved from"},{"citing_arxiv_id":"2002.08909","ref_index":7,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"REALM: Retrieval-Augmented Language Model Pre-Training","primary_cat":"cs.CL","submitted_at":"2020-02-10T18:40:59+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":8.0,"formal_verification":"none","one_line_summary":"REALM augments language-model pre-training with an unsupervised retriever over Wikipedia documents and reports 4-16% absolute gains on open-domain QA benchmarks over prior implicit and explicit knowledge methods.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1909.08053","ref_index":13,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism","primary_cat":"cs.CL","submitted_at":"2019-09-17T19:42:54+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Intra-layer model parallelism in PyTorch enables training of 8.3B-parameter transformers, achieving SOTA perplexity of 10.8 on WikiText103 and 66.5% accuracy on LAMBADA.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}