MARS parallel reservoirs achieve up to 21x training speedups and outperform LRU, S5, and Mamba on long sequence benchmarks while remaining gradient-free and compact.
citation dossier
Neural machine translation in linear time.arXiv:1610.10099
why this work matters in Pith
Pith has found this work in 4 reviewed papers. Its strongest current cluster is cs.LG (2 papers). The largest review-status bucket among citing papers is UNVERDICTED (3 papers). For highly cited works, this page shows a dossier first and a bounded explorer second; it never tries to render every citing paper at once.
representative citing papers
CEM recasts Transformer layers as energy minimization steps, enabling constrained parameterizations like weight sharing and low-rank interactions that match standard baselines in 100M-scale language modeling.
Geometric deep learning provides a unified mathematical framework based on grids, groups, graphs, geodesics, and gauges to explain and extend neural network architectures by incorporating physical regularities.
Pith review generated a malformed one-line summary.
citing papers explorer
-
Scalable Memristive-Friendly Reservoir Computing for Time Series Classification
MARS parallel reservoirs achieve up to 21x training speedups and outperform LRU, S5, and Mamba on long sequence benchmarks while remaining gradient-free and compact.
-
Revisiting Transformer Layer Parameterization Through Causal Energy Minimization
CEM recasts Transformer layers as energy minimization steps, enabling constrained parameterizations like weight sharing and low-rank interactions that match standard baselines in 100M-scale language modeling.
-
Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges
Geometric deep learning provides a unified mathematical framework based on grids, groups, graphs, geodesics, and gauges to explain and extend neural network architectures by incorporating physical regularities.
-
Attention Is All You Need
Pith review generated a malformed one-line summary.