hub

arXiv preprint arXiv:2601.07372 , year=

Xin Cheng, Wangding Zeng, Damai Dai, Qinyu Chen, Bingxuan Wang, Zhenda Xie, Kezhao Huang, Xingkai Yu, Zhewen Hao, Yukun Li, et al · 2026 · arXiv 2601.07372

15 Pith papers cite this work. Polarity classification is still indexing.

15 Pith papers citing it

read on arXiv browse 15 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

representative citing papers

Geometric Factual Recall in Transformers

cs.CL · 2026-05-12 · conditional · novelty 8.0

A single-layer transformer memorizes random subject-attribute bijections using logarithmic embedding dimension via linear superpositions in embeddings and ReLU-gated selection in the MLP, with zero-shot transfer to new facts and matching multi-hop constructions.

When and Why SignSGD Outperforms SGD: A Theoretical Study Based on $\ell_1$-norm Lower Bounds

cs.LG · 2026-05-07 · unverdicted · novelty 8.0

SignSGD provably beats SGD by a factor of d under sparse noise via matched ℓ1-norm upper and lower bounds, with an equivalent result for Muon on matrices, and this predicts faster GPT-2 pretraining.

Does Engram Do Memory Retrieval in Autoregressive Image Generation?

cs.CV · 2026-05-13 · accept · novelty 7.0

Engram in AR image generation saves backbone FLOPs but trails pure AR baselines in FID and behaves as a gated side-pathway rather than a content-addressed retriever.

NestPipe: Large-Scale Recommendation Training on 1,500+ Accelerators via Nested Pipelining

cs.DC · 2026-04-08 · unverdicted · novelty 7.0

NestPipe achieves up to 3.06x speedup and 94.07% scaling efficiency on 1,536 workers via dual-buffer inter-batch and frozen-window intra-batch pipelining that overlaps communication with computation.

Phasor Memory Networks: Stable Backpropagation Through Time for Scalable Explicit Memory

cs.LG · 2026-05-13 · unverdicted · novelty 6.0

PMNet uses unitary phasor dynamics and hierarchical anchors to make explicit memory stable for long sequences, matching a 3x larger Mamba model on long-context robustness with a 119M parameter network.

Conditional Memory Enhanced Item Representation for Generative Recommendation

cs.IR · 2026-05-12 · unverdicted · novelty 6.0

ComeIR introduces dual-level Engram memory and memory-restoring prediction to reconstruct SID-token embeddings and restore token granularity in generative recommendation.

Geospatial-Temporal Sensemaking of Remote Sensing Activity Detections with Multimodal Large Language Model

eess.IV · 2026-05-11 · unverdicted · novelty 6.0

Introduces the SMART-HC-VQA dataset with 65k single-image and 2.3M temporal VQA examples plus an adapted LLaVA-NeXT MLLM framework for geospatial-temporal sensemaking of remote sensing construction activity.

Contextual Memory-Enhanced Source Coding for Low-SNR Communications

cs.IT · 2026-05-06 · unverdicted · novelty 6.0

MASC internalizes multi-order n-gram patterns via shared PCM and MMER routing to refine source probabilities, shorten codelengths, and reduce sensitivity to channel errors in SSCC for low-SNR regimes.

Persistent Visual Memory: Sustaining Perception for Deep Generation in LVLMs

cs.CV · 2026-05-01 · unverdicted · novelty 6.0 · 2 refs

PVM adds a parallel branch to LVLMs that directly supplies visual embeddings to prevent attention decay over long generated sequences, yielding accuracy gains on reasoning tasks with minimal overhead.

Beyond N-gram: Data-Aware X-GRAM Extraction for Efficient Embedding Parameter Scaling

cs.CL · 2026-04-23 · unverdicted · novelty 6.0

X-GRAM applies data-aware dynamic token injection with hybrid hashing and local feature extraction to achieve up to 4.4 accuracy point gains over vanilla backbones and 3.2 over retrieval baselines at 0.73B-1.15B scales using 50% smaller tables.

The Past Is Not Past: Memory-Enhanced Dynamic Reward Shaping

cs.LG · 2026-04-13 · unverdicted · novelty 6.0

MEDS improves LLM RL performance by up to 4.13 pass@1 and 4.37 pass@128 points by dynamically penalizing rollouts matching prevalent historical error clusters identified via memory-stored representations and density clustering.

Cram Less to Fit More: Training Data Pruning Improves Memorization of Facts

cs.CL · 2026-04-09 · conditional · novelty 6.0

Loss-based pruning of training data to limit facts and flatten their frequency distribution enables a 110M-parameter GPT-2 model to memorize 1.3 times more entity facts than standard training, matching a 1.3B-parameter model on the full dataset.

Decoupling the Benefits of Subword Tokenization for Language Model Training via Byte-level Simulation

cs.CL · 2026-04-29 · unverdicted · novelty 5.0

Subword tokenization's main benefits arise from higher sample throughput and the use of subword boundaries as explicit priors or inductive biases, isolated via controlled byte-level simulations.

Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering

cs.SE · 2026-04-09 · accept · novelty 5.0

LLM agent progress depends on externalizing cognitive functions into memory, skills, protocols, and harness engineering that coordinates them reliably.

MemCoT: Test-Time Scaling through Memory-Driven Chain-of-Thought

cs.MA · 2026-04-09 · unverdicted · novelty 5.0

MemCoT redefines long-context reasoning as iterative stateful search with zoom-in/zoom-out memory perception and dual short-term memories, claiming SOTA results on LoCoMo and LongMemEval-S benchmarks.

citing papers explorer

Showing 15 of 15 citing papers.

Geometric Factual Recall in Transformers cs.CL · 2026-05-12 · conditional · none · ref 51
A single-layer transformer memorizes random subject-attribute bijections using logarithmic embedding dimension via linear superpositions in embeddings and ReLU-gated selection in the MLP, with zero-shot transfer to new facts and matching multi-hop constructions.
When and Why SignSGD Outperforms SGD: A Theoretical Study Based on $\ell_1$-norm Lower Bounds cs.LG · 2026-05-07 · unverdicted · none · ref 7
SignSGD provably beats SGD by a factor of d under sparse noise via matched ℓ1-norm upper and lower bounds, with an equivalent result for Muon on matrices, and this predicts faster GPT-2 pretraining.
Does Engram Do Memory Retrieval in Autoregressive Image Generation? cs.CV · 2026-05-13 · accept · none · ref 2
Engram in AR image generation saves backbone FLOPs but trails pure AR baselines in FID and behaves as a gated side-pathway rather than a content-addressed retriever.
NestPipe: Large-Scale Recommendation Training on 1,500+ Accelerators via Nested Pipelining cs.DC · 2026-04-08 · unverdicted · none · ref 8
NestPipe achieves up to 3.06x speedup and 94.07% scaling efficiency on 1,536 workers via dual-buffer inter-batch and frozen-window intra-batch pipelining that overlaps communication with computation.
Phasor Memory Networks: Stable Backpropagation Through Time for Scalable Explicit Memory cs.LG · 2026-05-13 · unverdicted · none · ref 3
PMNet uses unitary phasor dynamics and hierarchical anchors to make explicit memory stable for long sequences, matching a 3x larger Mamba model on long-context robustness with a 119M parameter network.
Conditional Memory Enhanced Item Representation for Generative Recommendation cs.IR · 2026-05-12 · unverdicted · none · ref 7
ComeIR introduces dual-level Engram memory and memory-restoring prediction to reconstruct SID-token embeddings and restore token granularity in generative recommendation.
Geospatial-Temporal Sensemaking of Remote Sensing Activity Detections with Multimodal Large Language Model eess.IV · 2026-05-11 · unverdicted · none · ref 36
Introduces the SMART-HC-VQA dataset with 65k single-image and 2.3M temporal VQA examples plus an adapted LLaVA-NeXT MLLM framework for geospatial-temporal sensemaking of remote sensing construction activity.
Contextual Memory-Enhanced Source Coding for Low-SNR Communications cs.IT · 2026-05-06 · unverdicted · none · ref 19
MASC internalizes multi-order n-gram patterns via shared PCM and MMER routing to refine source probabilities, shorten codelengths, and reduce sensitivity to channel errors in SSCC for low-SNR regimes.
Persistent Visual Memory: Sustaining Perception for Deep Generation in LVLMs cs.CV · 2026-05-01 · unverdicted · none · ref 14 · 2 links
PVM adds a parallel branch to LVLMs that directly supplies visual embeddings to prevent attention decay over long generated sequences, yielding accuracy gains on reasoning tasks with minimal overhead.
Beyond N-gram: Data-Aware X-GRAM Extraction for Efficient Embedding Parameter Scaling cs.CL · 2026-04-23 · unverdicted · none · ref 2
X-GRAM applies data-aware dynamic token injection with hybrid hashing and local feature extraction to achieve up to 4.4 accuracy point gains over vanilla backbones and 3.2 over retrieval baselines at 0.73B-1.15B scales using 50% smaller tables.
The Past Is Not Past: Memory-Enhanced Dynamic Reward Shaping cs.LG · 2026-04-13 · unverdicted · none · ref 40
MEDS improves LLM RL performance by up to 4.13 pass@1 and 4.37 pass@128 points by dynamically penalizing rollouts matching prevalent historical error clusters identified via memory-stored representations and density clustering.
Cram Less to Fit More: Training Data Pruning Improves Memorization of Facts cs.CL · 2026-04-09 · conditional · none · ref 17
Loss-based pruning of training data to limit facts and flatten their frequency distribution enables a 110M-parameter GPT-2 model to memorize 1.3 times more entity facts than standard training, matching a 1.3B-parameter model on the full dataset.
Decoupling the Benefits of Subword Tokenization for Language Model Training via Byte-level Simulation cs.CL · 2026-04-29 · unverdicted · none · ref 3
Subword tokenization's main benefits arise from higher sample throughput and the use of subword boundaries as explicit priors or inductive biases, isolated via controlled byte-level simulations.
Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering cs.SE · 2026-04-09 · accept · none · ref 22
LLM agent progress depends on externalizing cognitive functions into memory, skills, protocols, and harness engineering that coordinates them reliably.
MemCoT: Test-Time Scaling through Memory-Driven Chain-of-Thought cs.MA · 2026-04-09 · unverdicted · none · ref 7
MemCoT redefines long-context reasoning as iterative stateful search with zoom-in/zoom-out memory perception and dual short-term memories, claiming SOTA results on LoCoMo and LongMemEval-S benchmarks.

arXiv preprint arXiv:2601.07372 , year=

hub tools

fields

years

verdicts

representative citing papers

citing papers explorer