super hub Mixed citations

Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

Caglar Gulcehre, Junyoung Chung, Kyunghyun Cho, Yoshua Bengio · 2014 · cs.NE · arXiv 1412.3555

Mixed citation behavior. Most common role is background (62%).

131 Pith papers citing it

Background 62% of classified citations

open full Pith review browse 131 citing papers more from Caglar Gulcehre arXiv PDF

abstract

In this paper we compare different types of recurrent units in recurrent neural networks (RNNs). Especially, we focus on more sophisticated units that implement a gating mechanism, such as a long short-term memory (LSTM) unit and a recently proposed gated recurrent unit (GRU). We evaluate these recurrent units on the tasks of polyphonic music modeling and speech signal modeling. Our experiments revealed that these advanced recurrent units are indeed better than more traditional recurrent units such as tanh units. Also, we found GRU to be comparable to LSTM.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 8 method 3 baseline 1 other 1

citation-polarity summary

background 8 use method 3 baseline 1 unclear 1

claims ledger

abstract In this paper we compare different types of recurrent units in recurrent neural networks (RNNs). Especially, we focus on more sophisticated units that implement a gating mechanism, such as a long short-term memory (LSTM) unit and a recently proposed gated recurrent unit (GRU). We evaluate these recurrent units on the tasks of polyphonic music modeling and speech signal modeling. Our experiments revealed that these advanced recurrent units are indeed better than more traditional recurrent units such as tanh units. Also, we found GRU to be comparable to LSTM.

authors

Caglar Gulcehre Junyoung Chung Kyunghyun Cho Yoshua Bengio

co-cited works

representative citing papers

CanViT: Toward Active-Vision Foundation Models

cs.CV · 2026-03-23 · conditional · novelty 8.0

CanViT is the first task- and policy-agnostic AVFM pretrained via passive-to-active dense latent distillation on 13.2M scenes and 1B random glimpses, achieving 38.5% ADE20K mIoU in one glimpse and 84.5% ImageNet-1k top-1 after fine-tuning.

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

cs.LG · 2023-12-01 · unverdicted · novelty 8.0

Mamba is a linear-time sequence model using input-dependent selective SSMs that achieves SOTA results across modalities and matches twice-larger Transformers on language modeling with 5x higher inference throughput.

Identifying Latent Concepts and Structures for Generalized Category Discovery

cs.CV · 2026-07-01 · unverdicted · novelty 7.0

CPF-GCD enforces low-rank compositional structure on vision backbone features via spatial primitive fields so that novel categories emerge as new activation patterns over a shared vocabulary of reusable visual primitives.

Offline Multi-agent Continual Cooperation via Skill Partition and Reuse

cs.AI · 2026-06-24 · unverdicted · novelty 7.0

COMAD discovers and reuses coordination skills from mixed offline MARL data via auto-encoders and density-based estimation to achieve continual learning with better transfer.

Urdu Katib Handwritten Dataset: A Historical Document Dataset for Offline Urdu Handwritten Text Recognition with CRNN-Based Baseline Evaluation

cs.CV · 2026-06-17 · unverdicted · novelty 7.0

Presents UKHD, the first historical offline Urdu handwritten text lines dataset from Katib materials, and benchmarks CRNN-based models with CNN-BGRU-CTC showing lowest CER and WER.

LongSpike: Fractional Order Spiking State Space Models for Efficient Long Sequence Learning

cs.LG · 2026-06-11 · unverdicted · novelty 7.0

LongSpike integrates fractional-order state-space modeling into spiking neural networks, enabling better long-sequence performance than prior SNNs on LRA, WikiText-103, and Speech Commands benchmarks while retaining sparse computation.

CoMetaPNS: Continually Meta-learning Personalized Neural Surrogates for Cardiac Electrophysiology Simulations

cs.LG · 2026-06-05 · unverdicted · novelty 7.0

CoMetaPNS combines meta-learned neural surrogates with a continual Bayesian Gaussian Mixture Model to adapt cardiac electrophysiology simulations to new data while avoiding catastrophic forgetting.

AdaState: Self-Evolving Anchors for Streaming Video Generation

cs.CV · 2026-05-28 · unverdicted · novelty 7.0

AdaState replaces the static first-frame KV anchor with an evolving hidden latent that the model denoises alongside content, treating time as relative to enable recurrence and richer dynamics in streaming video generation.

LC-Flow: Learning Local Continuous Optical Flow and Confidence from events

cs.CV · 2026-05-23 · unverdicted · novelty 7.0

LC-Flow introduces a continuous local recurrent network for learning sparse optical flow and confidence directly from event streams, with confidence-guided aggregation reaching new SOTA on MVSEC.

Nested-GPT for variable-multiplicity parton showers: A case study in the resummation of non-global logarithms

hep-ph · 2026-05-18 · unverdicted · novelty 7.0 · 2 refs

Nested-GPT is an autoregressive Transformer surrogate that generates variable-multiplicity parton showers while enforcing ordered Markovian branching and matches reference Monte Carlo results for leading-log non-global logarithm resummation in the large-Nc limit.

Identify Then Project: Contrastive Learning of Latent Dynamics from Partial Observations with Port-Hamiltonian Structure

cs.LG · 2026-05-15 · unverdicted · novelty 7.0

A two-stage contrastive teacher-student framework learns and then projects latent dynamics onto port-Hamiltonian submanifolds from partial observations.

TokAlign++: Advancing Vocabulary Adaptation via Better Token Alignment

cs.CL · 2026-05-13 · unverdicted · novelty 7.0

TokAlign++ learns token alignments between LLM vocabularies from monolingual representations to enable faster adaptation, better text compression, and effective token-level distillation across 15 languages with minimal steps.

Vector-Quantized Discrete Latent Factors Meet Financial Priors: Dynamic Cross-Sectional Stock Ranking Prediction for Portfolio Construction

cs.LG · 2026-05-13 · conditional · novelty 7.0

PRISM-VQ integrates vector-quantized latent factors with financial priors and a structure-conditioned mixture-of-experts to deliver improved cross-sectional stock return predictions and portfolio performance on CSI 300 and S&P 500.

What-Where Transformer: A Slot-Centric Visual Backbone for Concurrent Representation and Localization

cs.CV · 2026-05-12 · unverdicted · novelty 7.0

The What-Where Transformer achieves explicit what-where separation in a ViT-style backbone via concurrent token and attention-map streams, yielding emergent object discovery from attention maps and better weakly-supervised localization.

TailedTS: Benchmark Dataset for Heavy-Tailed Time Series Prediction and Periodicity Quantification

cs.LG · 2026-05-09 · unverdicted · novelty 7.0

TailedTS supplies 24.69 billion Wikipedia page-view records as a public benchmark for heavy-tailed time series forecasting and periodicity analysis, revealing weaker periodic structure in high-traffic pages.

TCRTransBench: A Comprehensive Benchmark for Bidirectional TCR-Peptide Sequence Generation

q-bio.CB · 2026-05-06 · unverdicted · novelty 7.0

TCRTransBench provides a new benchmark with bidirectional TCR-peptide generation tasks, a large validated dataset, and metrics to evaluate neural models for immunological sequence modeling.

Learning to Theorize the World from Observation

cs.LG · 2026-05-05 · unverdicted · novelty 7.0

NEO is a probabilistic neural model that induces compositional programs as a learned Language of Thought from non-textual observations and executes them via a shared transition model to enable explanation-driven generalization.

Wireless Communication Enhanced Value Decomposition for Multi-Agent Reinforcement Learning

cs.LG · 2026-04-09 · unverdicted · novelty 7.0

CLOVER augments value decomposition with a GNN mixer whose weights depend on the realized wireless communication graph, proving permutation invariance, monotonicity, and greater expressiveness than QMIX while showing gains on Predator-Prey and Lumberjacks under p-CSMA channels.

Oscillators Are All You Need: Irregular Time Series Modelling via Damped Harmonic Oscillators with Closed-Form Solutions

cs.LG · 2026-02-12 · unverdicted · novelty 7.0

Damped harmonic oscillators with closed-form solutions model keys, values, and queries in continuous attention for irregular time series, preserving universal approximation while being orders of magnitude faster than prior NODE-based methods.

Mitigating Error Accumulation in Continuous Navigation via Memory-Augmented Kalman Filtering

cs.RO · 2026-01-30 · unverdicted · novelty 7.0

NeuroKalman mitigates state drift in vision-language UAV navigation by using memory-augmented Kalman filtering where attention retrieves historical anchors to correct predictions without gradient updates.

ExDoS: Expert-Guided Dual-Focus Cross-Modal Distillation for Smart Contract Vulnerability Detection

cs.CR · 2025-09-12 · unverdicted · novelty 7.0

ExDoS uses expert-guided dual-focus distillation between source semantic graphs and bytecode control-flow graphs plus a dual-attention network to improve smart contract vulnerability detection, reporting 3-6% F1 gains over baselines.

Unsupervised Learning of Local Updates for Maximum Independent Set in Dynamic Graphs

cs.LG · 2025-05-19 · unverdicted · novelty 7.0

Unsupervised GNN model learns local updates for approximate MaxIS on dynamic graphs, achieving competitive ratios on 200-1000 node instances and 1.00-1.18x larger solutions than other unsupervised models when generalizing to 100x larger graphs.

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

cs.LG · 2024-05-31 · unverdicted · novelty 7.0

Transformers and SSMs are unified through structured state space duality, producing a 2-8X faster Mamba-2 model that remains competitive with Transformers.

Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

cs.LG · 2024-02-29 · unverdicted · novelty 7.0

Griffin hybrid model matches Llama-2 performance while trained on over 6 times fewer tokens and offers lower inference latency with higher throughput.

citing papers explorer

Showing 7 of 7 citing papers after filters.

SCASRec: A Self-Correcting and Auto-Stopping Model for Generative Route List Recommendation cs.IR · 2026-02-03 · unverdicted · none · ref 10 · internal anchor
SCASRec unifies ranking and redundancy elimination for route lists via stepwise corrective rewards and an adaptive end-of-recommendation token, claiming SOTA results on two datasets and real deployment.
DeGRe: Dense-supervised Generative Reranking for Recommendation cs.IR · 2026-05-25 · unverdicted · none · ref 7 · internal anchor
DeGRe decouples offline exploration via a lookahead evaluator using beam search and cumulative regression to distill dense supervision into an online generator that approximates optimal reranking sequences with greedy decoding.
STK-Adapter: Incorporating Evolving Graph and Event Chain for Temporal Knowledge Graph Extrapolation cs.IR · 2026-04-21 · unverdicted · none · ref 5 · internal anchor
STK-Adapter adds Spatial-Temporal MoE, Event-Aware MoE, and Cross-Modality Alignment MoE to integrate evolving TKG graphs and event chains into LLMs, reducing information loss and improving extrapolation performance over prior methods.
Multimodal Large Language Models with Adaptive Preference Optimization for Sequential Recommendation cs.IR · 2025-11-24 · unverdicted · none · ref 8 · internal anchor
HaNoRec dynamically weights harder preference samples and applies Gaussian perturbations to output distributions to improve multimodal LLM performance on sequential recommendation tasks.
Click-Through Rate Prediction with the User Memory Network cs.IR · 2019-07-09 · unverdicted · none · ref 4 · internal anchor
MA-DNN augments DNNs with per-user memory vectors capturing likes and dislikes to exploit historical behavior for CTR prediction while remaining simpler than RNNs.
Dual-Rerank: Fusing Causality and Utility for Industrial Generative Reranking cs.IR · 2026-04-08 · unverdicted · none · ref 5 · internal anchor
Dual-Rerank fuses autoregressive and non-autoregressive generative reranking via knowledge distillation and uses list-wise decoupled RL optimization to improve whole-page utility and cut latency in industrial video search.
Global-local Spatial-temporal Aware Graph Attention Network for Network Traffic Forecasting cs.IR · 2025-05-11 · unverdicted · none · ref 19 · internal anchor
GLSTaGAT is a spatial-temporal graph attention network using data-driven fusion graphs, global-local blocks, node normalization, and a transformer encoder to outperform baselines on real-world network traffic datasets.

Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer