RNN computation is recovered from multi-hop graph pathways, and constraining these pathways via resolvent regularization yields improved temporal sparsity and task performance over standard L1.
Saxe, James L
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 2polarities
background 2representative citing papers
Weight decay controls distinct learning regimes in grokking transformers on modular arithmetic, tracked by new cheap attention-based diagnostics with empirical critical value and exponent fits.
Deep sequence models develop geometric memory in embeddings that encodes novel global relationships, transforming l-fold composition tasks into 1-step navigation via a natural spectral bias connected to Node2Vec.
Task connectivity in graph-structured multi-task environments enhances generalization and stability, with stronger benefits for attention models than MLPs.
citing papers explorer
-
Unifying Dynamical Systems and Graph Theory to Mechanistically Understand Computation in Neural Networks
RNN computation is recovered from multi-hop graph pathways, and constraining these pathways via resolvent regularization yields improved temporal sparsity and task performance over standard L1.
-
Weight Decay Regimes in Grokking Transformers: Cheap Online Diagnostics
Weight decay controls distinct learning regimes in grokking transformers on modular arithmetic, tracked by new cheap attention-based diagnostics with empirical critical value and exponent fits.
-
Deep sequence models tend to memorize geometrically; it is unclear why
Deep sequence models develop geometric memory in embeddings that encodes novel global relationships, transforming l-fold composition tasks into 1-step navigation via a natural spectral bias connected to Node2Vec.
-
Attention to task structure for cognitive flexibility
Task connectivity in graph-structured multi-task environments enhances generalization and stability, with stronger benefits for attention models than MLPs.