Neural machine translation in linear time.arXiv:1610.10099

Nal Kalchbrenner, Lasse Espeholt, Karen Simonyan, Aaron van den Oord, Alex Graves, Koray Kavukcuoglu · 2016 · arXiv 1610.10099

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

representative citing papers

Scalable Memristive-Friendly Reservoir Computing for Time Series Classification

cs.NE · 2026-04-21 · unverdicted · novelty 7.0

MARS parallel reservoirs achieve up to 21x training speedups and outperform LRU, S5, and Mamba on long sequence benchmarks while remaining gradient-free and compact.

Revisiting Transformer Layer Parameterization Through Causal Energy Minimization

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

CEM recasts Transformer layers as energy minimization steps, enabling constrained parameterizations like weight sharing and low-rank interactions that match standard baselines in 100M-scale language modeling.

Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges

cs.LG · 2021-04-27 · accept · novelty 6.0

Geometric deep learning provides a unified mathematical framework based on grids, groups, graphs, geodesics, and gauges to explain and extend neural network architectures by incorporating physical regularities.

Attention Is All You Need

cs.CL · 2017-06-12 · unverdicted · novelty 5.0

Pith review generated a malformed one-line summary.

citing papers explorer

Showing 3 of 3 citing papers after filters.

Scalable Memristive-Friendly Reservoir Computing for Time Series Classification cs.NE · 2026-04-21 · unverdicted · none · ref 34
MARS parallel reservoirs achieve up to 21x training speedups and outperform LRU, S5, and Mamba on long sequence benchmarks while remaining gradient-free and compact.
Revisiting Transformer Layer Parameterization Through Causal Energy Minimization cs.LG · 2026-05-08 · unverdicted · none · ref 10
CEM recasts Transformer layers as energy minimization steps, enabling constrained parameterizations like weight sharing and low-rank interactions that match standard baselines in 100M-scale language modeling.
Attention Is All You Need cs.CL · 2017-06-12 · unverdicted · none · ref 18
Pith review generated a malformed one-line summary.

Neural machine translation in linear time.arXiv:1610.10099

fields

years

verdicts

representative citing papers

citing papers explorer