hub

Learning long-term dependencies with gradient descent is difficult.IEEE Transactions on Neural Networks, 5(2):157–166

Y. Bengio, P. Simard, P. Frasconi · 1994 · IEEE Transactions on Neural Networks · DOI 10.1109/72.279181

11 Pith papers cite this work, alongside 6,592 external citations. Polarity classification is still indexing.

11 Pith papers citing it

6,592 external citations · Crossref

open at publisher browse 11 citing papers

hub tools

JSON dossier citing papers JSON publisher DOI

citation-role summary

background 4

citation-polarity summary

background 4

representative citing papers

MinMax Recurrent Neural Cascades

cs.LG · 2026-05-07 · conditional · novelty 8.0 · 2 refs

MinMax RNCs are recurrent neural models using min-max recurrence that achieve full regular-language expressivity, logarithmic parallel evaluation, uniformly bounded states, and constant state gradients independent of time distance.

HRM-Text: Efficient Pretraining Beyond Scaling

cs.CL · 2026-05-20 · unverdicted · novelty 6.0

A 1B-parameter hierarchical recurrent model pretrained on 40B instruction-response tokens achieves 60.7% MMLU and strong results on ARC-C, DROP, GSM8K, and MATH while using 100-900x fewer tokens than standard baselines.

Do Neural Operators Forget Geometry? The Forgetting Hypothesis in Deep Operator Learning

cs.LG · 2026-05-07 · unverdicted · novelty 6.0

Neural operators progressively forget domain geometry with depth due to Markovian layers and global mixing; a geometry memory injection mechanism mitigates this forgetting.

OGPO: Sample Efficient Full-Finetuning of Generative Control Policies

cs.LG · 2026-05-04 · unverdicted · novelty 6.0

OGPO is a sample-efficient off-policy method for full finetuning of generative control policies that reaches SOTA on robotic manipulation tasks and can recover from poor behavior-cloning initializations without expert data.

M$^2$RNN: Non-Linear RNNs with Matrix-Valued States for Scalable Language Modeling

cs.LG · 2026-03-15 · unverdicted · novelty 6.0

M²RNN achieves perfect state tracking at unseen lengths and outperforms Gated DeltaNet hybrids by 0.4-0.5 perplexity on 7B models with 3x smaller recurrent states.

mGRADE: Minimal Recurrent Gating Meets Delay Convolutions for Lightweight Sequence Modeling

cs.LG · 2025-07-02 · unverdicted · novelty 6.0

mGRADE uses learnable-spaced convolutions shown to be equivalent to delay embeddings plus a lightweight gated recurrent component to achieve low-memory multi-timescale sequence modeling.

The Serial Scaling Hypothesis

cs.LG · 2025-07-16 · unverdicted · novelty 5.0

The serial scaling hypothesis formalizes inherently serial problems in complexity theory and demonstrates that diffusion models cannot solve them.

LLM4Log: A Systematic Review of Large Language Model-based Log Analysis

cs.SE · 2026-03-18 · unverdicted · novelty 4.0 · 2 refs

Systematic review of 145 papers on LLM-based log analysis, providing a unified taxonomy, common design patterns, evaluation practices, and challenges for deployment under drift and limited labels.

Predicting one-year clinical instability and mortality in heart failure patients using sequence modeling

cs.LG · 2025-11-20 · unverdicted · novelty 4.0

Sequence models on EHR data from a Swedish heart failure cohort achieve AUPRCs of 0.555 to 0.854 for one-year instability and mortality predictions and support four care pathways.

Deep Learning for Sequential Decision Making under Uncertainty: Foundations, Frameworks, and Frontiers

math.OC · 2026-04-13 · unverdicted · novelty 2.0

A tutorial framing deep learning as a complement to optimization for sequential decision-making under uncertainty, with applications in supply chains, healthcare, and energy.

Hardware-Software Co-Design of Scalable, Energy-Efficient Analog Recurrent Computations

cs.AR · 2026-05-12

citing papers explorer

Showing 11 of 11 citing papers.

MinMax Recurrent Neural Cascades cs.LG · 2026-05-07 · conditional · none · ref 1 · 2 links
MinMax RNCs are recurrent neural models using min-max recurrence that achieve full regular-language expressivity, logarithmic parallel evaluation, uniformly bounded states, and constant state gradients independent of time distance.
HRM-Text: Efficient Pretraining Beyond Scaling cs.CL · 2026-05-20 · unverdicted · none · ref 5
A 1B-parameter hierarchical recurrent model pretrained on 40B instruction-response tokens achieves 60.7% MMLU and strong results on ARC-C, DROP, GSM8K, and MATH while using 100-900x fewer tokens than standard baselines.
Do Neural Operators Forget Geometry? The Forgetting Hypothesis in Deep Operator Learning cs.LG · 2026-05-07 · unverdicted · none · ref 2
Neural operators progressively forget domain geometry with depth due to Markovian layers and global mixing; a geometry memory injection mechanism mitigates this forgetting.
OGPO: Sample Efficient Full-Finetuning of Generative Control Policies cs.LG · 2026-05-04 · unverdicted · none · ref 135
OGPO is a sample-efficient off-policy method for full finetuning of generative control policies that reaches SOTA on robotic manipulation tasks and can recover from poor behavior-cloning initializations without expert data.
M$^2$RNN: Non-Linear RNNs with Matrix-Valued States for Scalable Language Modeling cs.LG · 2026-03-15 · unverdicted · none · ref 2
M²RNN achieves perfect state tracking at unseen lengths and outperforms Gated DeltaNet hybrids by 0.4-0.5 perplexity on 7B models with 3x smaller recurrent states.
mGRADE: Minimal Recurrent Gating Meets Delay Convolutions for Lightweight Sequence Modeling cs.LG · 2025-07-02 · unverdicted · none · ref 2
mGRADE uses learnable-spaced convolutions shown to be equivalent to delay embeddings plus a lightweight gated recurrent component to achieve low-memory multi-timescale sequence modeling.
The Serial Scaling Hypothesis cs.LG · 2025-07-16 · unverdicted · none · ref 8
The serial scaling hypothesis formalizes inherently serial problems in complexity theory and demonstrates that diffusion models cannot solve them.
LLM4Log: A Systematic Review of Large Language Model-based Log Analysis cs.SE · 2026-03-18 · unverdicted · none · ref 8 · 2 links
Systematic review of 145 papers on LLM-based log analysis, providing a unified taxonomy, common design patterns, evaluation practices, and challenges for deployment under drift and limited labels.
Predicting one-year clinical instability and mortality in heart failure patients using sequence modeling cs.LG · 2025-11-20 · unverdicted · none · ref 15
Sequence models on EHR data from a Swedish heart failure cohort achieve AUPRCs of 0.555 to 0.854 for one-year instability and mortality predictions and support four care pathways.
Deep Learning for Sequential Decision Making under Uncertainty: Foundations, Frameworks, and Frontiers math.OC · 2026-04-13 · unverdicted · none · ref 17
A tutorial framing deep learning as a complement to optimization for sequential decision-making under uncertainty, with applications in supply chains, healthcare, and energy.
Hardware-Software Co-Design of Scalable, Energy-Efficient Analog Recurrent Computations cs.AR · 2026-05-12 · unreviewed · ref 29

Learning long-term dependencies with gradient descent is difficult.IEEE Transactions on Neural Networks, 5(2):157–166

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer