hub

Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, Yoshua Bengio · 2014 · cs.NE · arXiv 1412.3555

35 Pith papers cite this work. Polarity classification is still indexing.

35 Pith papers citing it

open full Pith review browse 35 citing papers arXiv PDF

abstract

In this paper we compare different types of recurrent units in recurrent neural networks (RNNs). Especially, we focus on more sophisticated units that implement a gating mechanism, such as a long short-term memory (LSTM) unit and a recently proposed gated recurrent unit (GRU). We evaluate these recurrent units on the tasks of polyphonic music modeling and speech signal modeling. Our experiments revealed that these advanced recurrent units are indeed better than more traditional recurrent units such as tanh units. Also, we found GRU to be comparable to LSTM.

hub tools

JSON dossier citing papers JSON arXiv source

claims ledger

abstract In this paper we compare different types of recurrent units in recurrent neural networks (RNNs). Especially, we focus on more sophisticated units that implement a gating mechanism, such as a long short-term memory (LSTM) unit and a recently proposed gated recurrent unit (GRU). We evaluate these recurrent units on the tasks of polyphonic music modeling and speech signal modeling. Our experiments revealed that these advanced recurrent units are indeed better than more traditional recurrent units such as tanh units. Also, we found GRU to be comparable to LSTM.

co-cited works

representative citing papers

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

cs.LG · 2023-12-01 · unverdicted · novelty 8.0

Mamba is a linear-time sequence model using input-dependent selective SSMs that achieves SOTA results across modalities and matches twice-larger Transformers on language modeling with 5x higher inference throughput.

TokAlign++: Advancing Vocabulary Adaptation via Better Token Alignment

cs.CL · 2026-05-13 · unverdicted · novelty 7.0

TokAlign++ learns token alignments between LLM vocabularies from monolingual representations to enable faster adaptation, better text compression, and effective token-level distillation across 15 languages with minimal steps.

Vector-Quantized Discrete Latent Factors Meet Financial Priors: Dynamic Cross-Sectional Stock Ranking Prediction for Portfolio Construction

cs.LG · 2026-05-13 · conditional · novelty 7.0

PRISM-VQ integrates vector-quantized latent factors with financial priors and a structure-conditioned mixture-of-experts to deliver improved cross-sectional stock return predictions and portfolio performance on CSI 300 and S&P 500.

What-Where Transformer: A Slot-Centric Visual Backbone for Concurrent Representation and Localization

cs.CV · 2026-05-12 · unverdicted · novelty 7.0

The What-Where Transformer achieves explicit what-where separation in a ViT-style backbone via concurrent token and attention-map streams, yielding emergent object discovery from attention maps and better weakly-supervised localization.

TCRTransBench: A Comprehensive Benchmark for Bidirectional TCR-Peptide Sequence Generation

q-bio.CB · 2026-05-06 · unverdicted · novelty 7.0

TCRTransBench provides a new benchmark with bidirectional TCR-peptide generation tasks, a large validated dataset, and metrics to evaluate neural models for immunological sequence modeling.

Wireless Communication Enhanced Value Decomposition for Multi-Agent Reinforcement Learning

cs.LG · 2026-04-09 · unverdicted · novelty 7.0

CLOVER augments value decomposition with a GNN mixer whose weights depend on the realized wireless communication graph, proving permutation invariance, monotonicity, and greater expressiveness than QMIX while showing gains on Predator-Prey and Lumberjacks under p-CSMA channels.

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

cs.LG · 2024-05-31 · unverdicted · novelty 7.0

Transformers and SSMs are unified through structured state space duality, producing a 2-8X faster Mamba-2 model that remains competitive with Transformers.

What If We Let Forecasting Forget? A Sparse Bottleneck for Cross-Variable Dependencies

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

MS-FLOW uses a capacity-limited sparse routing mechanism to model only critical inter-variable dependencies in time series data, achieving state-of-the-art accuracy on 12 benchmarks with fewer but more reliable connections.

DexSynRefine: Synthesizing and Refining Human-Object Interaction Motion for Physically Feasible Dexterous Robot Actions

cs.RO · 2026-05-07 · unverdicted · novelty 6.0

DexSynRefine synthesizes HOI motions with an extended manifold method, refines them via task-space residual RL, and adapts for sim-to-real transfer, outperforming kinematic retargeting by 50-70 percentage points on five dexterous tasks.

Learning to Theorize the World from Observation

cs.LG · 2026-05-05 · unverdicted · novelty 6.0

NEO induces compositional latent programs as world theories from observations and executes them to enable explanation-driven generalization.

Object Referring-Guided Scanpath Prediction with Perception-Enhanced Vision-Language Models

cs.CV · 2026-04-22 · unverdicted · novelty 6.0

ScanVLA uses a vision-language model with a history-enhanced decoder and frozen segmentation LoRA to outperform prior methods on object-referring scanpath prediction.

UniDetect: LLM-Driven Universal Fraud Detection across Heterogeneous Blockchains

cs.CR · 2026-04-14 · unverdicted · novelty 6.0

UniDetect is an LLM-based system that generates universal transaction summary texts and uses two-stage multimodal training on text plus graphs to detect fraudulent accounts across heterogeneous blockchains, outperforming baselines by 5.57-7.58% KS and achieving over 94.58% zero-shot cross-chain and

Learning to Test: Physics-Informed Representation for Dynamical Instability Detection

cs.LG · 2026-04-13 · unverdicted · novelty 6.0

A physics-informed neural representation is learned from safe data to support distributional hypothesis testing for dynamical instability in stochastic DAE systems without repeated simulations.

RF-LEGO: Modularized Signal Processing-Deep Learning Co-Design for RF Sensing via Deep Unrolling

cs.DC · 2026-04-11 · unverdicted · novelty 6.0

RF-LEGO turns signal processing algorithms into trainable modular DL modules via deep unrolling, outperforming pure SP and DL baselines in RF sensing while preserving interpretability.

Behavior-Aware Item Modeling via Dynamic Procedural Solution Representations for Knowledge Tracing

cs.CL · 2026-04-09 · unverdicted · novelty 6.0

BAIM enriches knowledge tracing item representations by deriving stage-level embeddings from Polya's four problem-solving stages and routing them adaptively per learner context, yielding consistent gains over pretraining baselines on two datasets.

CWRNN-INVR: A Coupled WarpRNN based Implicit Neural Video Representation

eess.IV · 2026-04-08 · unverdicted · novelty 6.0

CWRNN-INVR combines WarpRNN for structured video information and residual grids for irregular details to reach 33.73 dB average PSNR on the UVG dataset at 3M parameters, outperforming existing INVR methods.

IntentScore: Intent-Conditioned Action Evaluation for Computer-Use Agents

cs.AI · 2026-04-06 · unverdicted · novelty 6.0

IntentScore learns intent-conditioned action scores from offline GUI trajectories and raises task success by 6.9 points on an unseen agent and environment.

MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention

cs.CL · 2025-06-16 · unverdicted · novelty 6.0

MiniMax-M1 is a 456B parameter hybrid-attention MoE model trained with CISPO RL that achieves performance comparable or superior to DeepSeek-R1 and Qwen3-235B on reasoning and software engineering tasks while training in three weeks on 512 GPUs.

Rethinking Random Transformers as Adaptive Sequence Smoothers for Sleep Staging

cs.LG · 2026-05-11 · unverdicted · novelty 5.0

Randomly initialized Transformers act as adaptive sequence smoothers for sleep staging via a Random Attention Prior Kernel, with gains mainly from inductive bias rather than training.

MDN: Parallelizing Stepwise Momentum for Delta Linear Attention

cs.LG · 2026-05-07 · unverdicted · novelty 5.0

MDN parallelizes stepwise momentum for delta linear attention using geometric reordering and dynamical systems analysis, yielding performance gains over Mamba2 and GDN on 400M and 1.3B models.

Neural Co-state Policies: Structuring Hidden States in Recurrent Reinforcement Learning

cs.LG · 2026-05-06 · unverdicted · novelty 5.0 · 2 refs

Recurrent RL policies can have their hidden states aligned with PMP co-states through a derived loss, yielding robust performance on partially observable control tasks.

ReMedi: Reasoner for Medical Clinical Prediction

cs.CL · 2026-05-02 · unverdicted · novelty 5.0

ReMedi boosts LLM performance on EHR clinical predictions by up to 19.9% F1 through ground-truth-guided rationale regeneration and fine-tuning.

To Use AI as Dice of Possibilities with Timing Computation

cs.AI · 2026-05-01 · unverdicted · novelty 5.0

Proposes verb-based paradigm with timing computation to enable data-driven discovery of patient trajectories and counterfactual timing from EHR data without domain knowledge.

HOI-aware Adaptive Network for Weakly-supervised Action Segmentation

cs.CV · 2026-04-29 · unverdicted · novelty 5.0

AdaAct employs a HOI encoder and two-branch hypernetwork to adaptively adjust temporal encoding parameters based on video-level human-object interactions for improved weakly-supervised action segmentation.

citing papers explorer

Showing 34 of 34 citing papers after filters.

Mamba: Linear-Time Sequence Modeling with Selective State Spaces cs.LG · 2023-12-01 · unverdicted · none · ref 17 · internal anchor
Mamba is a linear-time sequence model using input-dependent selective SSMs that achieves SOTA results across modalities and matches twice-larger Transformers on language modeling with 5x higher inference throughput.
TokAlign++: Advancing Vocabulary Adaptation via Better Token Alignment cs.CL · 2026-05-13 · unverdicted · none · ref 89 · internal anchor
TokAlign++ learns token alignments between LLM vocabularies from monolingual representations to enable faster adaptation, better text compression, and effective token-level distillation across 15 languages with minimal steps.
What-Where Transformer: A Slot-Centric Visual Backbone for Concurrent Representation and Localization cs.CV · 2026-05-12 · unverdicted · none · ref 13 · internal anchor
The What-Where Transformer achieves explicit what-where separation in a ViT-style backbone via concurrent token and attention-map streams, yielding emergent object discovery from attention maps and better weakly-supervised localization.
TCRTransBench: A Comprehensive Benchmark for Bidirectional TCR-Peptide Sequence Generation q-bio.CB · 2026-05-06 · unverdicted · none · ref 14 · internal anchor
TCRTransBench provides a new benchmark with bidirectional TCR-peptide generation tasks, a large validated dataset, and metrics to evaluate neural models for immunological sequence modeling.
Wireless Communication Enhanced Value Decomposition for Multi-Agent Reinforcement Learning cs.LG · 2026-04-09 · unverdicted · none · ref 46 · internal anchor
CLOVER augments value decomposition with a GNN mixer whose weights depend on the realized wireless communication graph, proving permutation invariance, monotonicity, and greater expressiveness than QMIX while showing gains on Predator-Prey and Lumberjacks under p-CSMA channels.
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality cs.LG · 2024-05-31 · unverdicted · none · ref 21 · internal anchor
Transformers and SSMs are unified through structured state space duality, producing a 2-8X faster Mamba-2 model that remains competitive with Transformers.
What If We Let Forecasting Forget? A Sparse Bottleneck for Cross-Variable Dependencies cs.LG · 2026-05-08 · unverdicted · none · ref 97 · internal anchor
MS-FLOW uses a capacity-limited sparse routing mechanism to model only critical inter-variable dependencies in time series data, achieving state-of-the-art accuracy on 12 benchmarks with fewer but more reliable connections.
DexSynRefine: Synthesizing and Refining Human-Object Interaction Motion for Physically Feasible Dexterous Robot Actions cs.RO · 2026-05-07 · unverdicted · none · ref 26 · internal anchor
DexSynRefine synthesizes HOI motions with an extended manifold method, refines them via task-space residual RL, and adapts for sim-to-real transfer, outperforming kinematic retargeting by 50-70 percentage points on five dexterous tasks.
Learning to Theorize the World from Observation cs.LG · 2026-05-05 · unverdicted · none · ref 85 · internal anchor
NEO induces compositional latent programs as world theories from observations and executes them to enable explanation-driven generalization.
Object Referring-Guided Scanpath Prediction with Perception-Enhanced Vision-Language Models cs.CV · 2026-04-22 · unverdicted · none · ref 8 · internal anchor
ScanVLA uses a vision-language model with a history-enhanced decoder and frozen segmentation LoRA to outperform prior methods on object-referring scanpath prediction.
UniDetect: LLM-Driven Universal Fraud Detection across Heterogeneous Blockchains cs.CR · 2026-04-14 · unverdicted · none · ref 2 · internal anchor
UniDetect is an LLM-based system that generates universal transaction summary texts and uses two-stage multimodal training on text plus graphs to detect fraudulent accounts across heterogeneous blockchains, outperforming baselines by 5.57-7.58% KS and achieving over 94.58% zero-shot cross-chain and
Learning to Test: Physics-Informed Representation for Dynamical Instability Detection cs.LG · 2026-04-13 · unverdicted · none · ref 16 · internal anchor
A physics-informed neural representation is learned from safe data to support distributional hypothesis testing for dynamical instability in stochastic DAE systems without repeated simulations.
RF-LEGO: Modularized Signal Processing-Deep Learning Co-Design for RF Sensing via Deep Unrolling cs.DC · 2026-04-11 · unverdicted · none · ref 12 · internal anchor
RF-LEGO turns signal processing algorithms into trainable modular DL modules via deep unrolling, outperforming pure SP and DL baselines in RF sensing while preserving interpretability.
Behavior-Aware Item Modeling via Dynamic Procedural Solution Representations for Knowledge Tracing cs.CL · 2026-04-09 · unverdicted · none · ref 2 · internal anchor
BAIM enriches knowledge tracing item representations by deriving stage-level embeddings from Polya's four problem-solving stages and routing them adaptively per learner context, yielding consistent gains over pretraining baselines on two datasets.
CWRNN-INVR: A Coupled WarpRNN based Implicit Neural Video Representation eess.IV · 2026-04-08 · unverdicted · none · ref 39 · internal anchor
CWRNN-INVR combines WarpRNN for structured video information and residual grids for irregular details to reach 33.73 dB average PSNR on the UVG dataset at 3M parameters, outperforming existing INVR methods.
IntentScore: Intent-Conditioned Action Evaluation for Computer-Use Agents cs.AI · 2026-04-06 · unverdicted · none · ref 4 · internal anchor
IntentScore learns intent-conditioned action scores from offline GUI trajectories and raises task success by 6.9 points on an unseen agent and environment.
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention cs.CL · 2025-06-16 · unverdicted · none · ref 5 · internal anchor
MiniMax-M1 is a 456B parameter hybrid-attention MoE model trained with CISPO RL that achieves performance comparable or superior to DeepSeek-R1 and Qwen3-235B on reasoning and software engineering tasks while training in three weeks on 512 GPUs.
Rethinking Random Transformers as Adaptive Sequence Smoothers for Sleep Staging cs.LG · 2026-05-11 · unverdicted · none · ref 91 · internal anchor
Randomly initialized Transformers act as adaptive sequence smoothers for sleep staging via a Random Attention Prior Kernel, with gains mainly from inductive bias rather than training.
MDN: Parallelizing Stepwise Momentum for Delta Linear Attention cs.LG · 2026-05-07 · unverdicted · none · ref 108 · internal anchor
MDN parallelizes stepwise momentum for delta linear attention using geometric reordering and dynamical systems analysis, yielding performance gains over Mamba2 and GDN on 400M and 1.3B models.
Neural Co-state Policies: Structuring Hidden States in Recurrent Reinforcement Learning cs.LG · 2026-05-06 · unverdicted · none · ref 42 · 2 links · internal anchor
Recurrent RL policies can have their hidden states aligned with PMP co-states through a derived loss, yielding robust performance on partially observable control tasks.
ReMedi: Reasoner for Medical Clinical Prediction cs.CL · 2026-05-02 · unverdicted · none · ref 3 · internal anchor
ReMedi boosts LLM performance on EHR clinical predictions by up to 19.9% F1 through ground-truth-guided rationale regeneration and fine-tuning.
To Use AI as Dice of Possibilities with Timing Computation cs.AI · 2026-05-01 · unverdicted · none · ref 48 · internal anchor
Proposes verb-based paradigm with timing computation to enable data-driven discovery of patient trajectories and counterfactual timing from EHR data without domain knowledge.
HOI-aware Adaptive Network for Weakly-supervised Action Segmentation cs.CV · 2026-04-29 · unverdicted · none · ref 5 · internal anchor
AdaAct employs a HOI encoder and two-branch hypernetwork to adaptively adjust temporal encoding parameters based on video-level human-object interactions for improved weakly-supervised action segmentation.
LASER: Learning Active Sensing for Continuum Field Reconstruction cs.LG · 2026-04-21 · unverdicted · none · ref 17 · internal anchor
LASER trains a reinforcement learning policy inside a latent dynamics model to choose sensor placements that improve reconstruction of continuum fields under sparsity.
STK-Adapter: Incorporating Evolving Graph and Event Chain for Temporal Knowledge Graph Extrapolation cs.IR · 2026-04-21 · unverdicted · none · ref 5 · internal anchor
STK-Adapter adds Spatial-Temporal MoE, Event-Aware MoE, and Cross-Modality Alignment MoE to integrate evolving TKG graphs and event chains into LLMs, reducing information loss and improving extrapolation performance over prior methods.
Gated Memory Policy cs.RO · 2026-04-21 · unverdicted · none · ref 9 · internal anchor
GMP selectively activates and represents memory via a gate and lightweight cross-attention, yielding 30.1% higher success on non-Markovian robotic tasks while staying competitive on Markovian ones.
Efficient and Effective Internal Memory Retrieval for LLM-Based Healthcare Prediction cs.CL · 2026-04-08 · unverdicted · none · ref 2 · internal anchor
K2K framework enables internal memory retrieval in LLMs for healthcare outcome prediction, achieving state-of-the-art results on four benchmarks.
Adaptive Learned State Estimation based on KalmanNet cs.RO · 2026-04-02 · unverdicted · none · ref 17 · internal anchor
AM-KNet adds sensor-specific modules, hypernetwork conditioning on target type and pose, and Joseph-form covariance estimation to KalmanNet, yielding better accuracy and stability than base KalmanNet on nuScenes and View-of-Delft data.
Attention Is All You Need cs.CL · 2017-06-12 · unverdicted · none · ref 7 · internal anchor
Pith review generated a malformed one-line summary.
Physics-based Digital Twins for Integrated Thermal Energy Systems Using Active Learning cs.LG · 2026-05-07 · unverdicted · none · ref 25 · internal anchor
Active learning with physics-informed surrogates achieves comparable accuracy for a glycol heat exchanger digital twin using only one-fifth the high-fidelity simulation trajectories needed by random sampling.
Quadruped Parkour Learning: Sparsely Gated Mixture of Experts with Visual Input cs.RO · 2026-04-21 · unverdicted · none · ref 32 · internal anchor
Sparsely gated MoE policies double the success rate of a real Unitree Go2 quadruped on large-obstacle parkour versus matched-active-parameter MLP baselines while cutting inference time compared with a scaled-up MLP.
Impact of leaky dynamics on predictive path integration accuracy in recurrent neural networks cs.NE · 2026-04-17 · unverdicted · none · ref 45 · internal anchor
Leaky RNNs improve grid-cell-like representations and path-integration accuracy by acting as a low-pass filter that stabilizes dynamics against noise.
Dual-Rerank: Fusing Causality and Utility for Industrial Generative Reranking cs.IR · 2026-04-08 · unverdicted · none · ref 5 · internal anchor
Dual-Rerank fuses autoregressive and non-autoregressive generative reranking via knowledge distillation and uses list-wise decoupled RL optimization to improve whole-page utility and cut latency in industrial video search.
Net Load Forecasting Using Machine Learning with Growing Renewable Power Capacity Features: A Comparative Study of Direct and Indirect Methods eess.SY · 2026-04-18 · unverdicted · none · ref 18 · internal anchor
Indirect LSTM outperformed direct and indirect FCNN approaches for net load forecasting when renewable capacity is included as a feature.

Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

hub tools

claims ledger

co-cited works

fields

years

verdicts

representative citing papers

citing papers explorer