hub Mixed citations

Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, Yoshua Bengio · 2014 · cs.NE · arXiv 1412.3555

Mixed citation behavior. Most common role is background (62%).

87 Pith papers citing it

Background 62% of classified citations

open full Pith review browse 87 citing papers arXiv PDF

abstract

In this paper we compare different types of recurrent units in recurrent neural networks (RNNs). Especially, we focus on more sophisticated units that implement a gating mechanism, such as a long short-term memory (LSTM) unit and a recently proposed gated recurrent unit (GRU). We evaluate these recurrent units on the tasks of polyphonic music modeling and speech signal modeling. Our experiments revealed that these advanced recurrent units are indeed better than more traditional recurrent units such as tanh units. Also, we found GRU to be comparable to LSTM.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 8 method 3 baseline 1 other 1

citation-polarity summary

background 8 use method 3 baseline 1 unclear 1

claims ledger

abstract In this paper we compare different types of recurrent units in recurrent neural networks (RNNs). Especially, we focus on more sophisticated units that implement a gating mechanism, such as a long short-term memory (LSTM) unit and a recently proposed gated recurrent unit (GRU). We evaluate these recurrent units on the tasks of polyphonic music modeling and speech signal modeling. Our experiments revealed that these advanced recurrent units are indeed better than more traditional recurrent units such as tanh units. Also, we found GRU to be comparable to LSTM.

co-cited works

representative citing papers

CanViT: Toward Active-Vision Foundation Models

cs.CV · 2026-03-23 · conditional · novelty 8.0

CanViT is the first task- and policy-agnostic AVFM pretrained via passive-to-active dense latent distillation on 13.2M scenes and 1B random glimpses, achieving 38.5% ADE20K mIoU in one glimpse and 84.5% ImageNet-1k top-1 after fine-tuning.

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

cs.LG · 2023-12-01 · unverdicted · novelty 8.0

Mamba is a linear-time sequence model using input-dependent selective SSMs that achieves SOTA results across modalities and matches twice-larger Transformers on language modeling with 5x higher inference throughput.

Nested-GPT for variable-multiplicity parton showers: A case study in the resummation of non-global logarithms

hep-ph · 2026-05-18 · unverdicted · novelty 7.0 · 2 refs

Nested-GPT is an autoregressive Transformer surrogate that generates variable-multiplicity parton showers while enforcing ordered Markovian branching and matches reference Monte Carlo results for leading-log non-global logarithm resummation in the large-Nc limit.

Identify Then Project: Contrastive Learning of Latent Dynamics from Partial Observations with Port-Hamiltonian Structure

cs.LG · 2026-05-15 · unverdicted · novelty 7.0

A two-stage contrastive teacher-student framework learns and then projects latent dynamics onto port-Hamiltonian submanifolds from partial observations.

TokAlign++: Advancing Vocabulary Adaptation via Better Token Alignment

cs.CL · 2026-05-13 · unverdicted · novelty 7.0

TokAlign++ learns token alignments between LLM vocabularies from monolingual representations to enable faster adaptation, better text compression, and effective token-level distillation across 15 languages with minimal steps.

Vector-Quantized Discrete Latent Factors Meet Financial Priors: Dynamic Cross-Sectional Stock Ranking Prediction for Portfolio Construction

cs.LG · 2026-05-13 · conditional · novelty 7.0

PRISM-VQ integrates vector-quantized latent factors with financial priors and a structure-conditioned mixture-of-experts to deliver improved cross-sectional stock return predictions and portfolio performance on CSI 300 and S&P 500.

What-Where Transformer: A Slot-Centric Visual Backbone for Concurrent Representation and Localization

cs.CV · 2026-05-12 · unverdicted · novelty 7.0

The What-Where Transformer achieves explicit what-where separation in a ViT-style backbone via concurrent token and attention-map streams, yielding emergent object discovery from attention maps and better weakly-supervised localization.

TailedTS: Benchmark Dataset for Heavy-Tailed Time Series Prediction and Periodicity Quantification

cs.LG · 2026-05-09 · unverdicted · novelty 7.0

TailedTS supplies 24.69 billion Wikipedia page-view records as a public benchmark for heavy-tailed time series forecasting and periodicity analysis, revealing weaker periodic structure in high-traffic pages.

TCRTransBench: A Comprehensive Benchmark for Bidirectional TCR-Peptide Sequence Generation

q-bio.CB · 2026-05-06 · unverdicted · novelty 7.0

TCRTransBench provides a new benchmark with bidirectional TCR-peptide generation tasks, a large validated dataset, and metrics to evaluate neural models for immunological sequence modeling.

Wireless Communication Enhanced Value Decomposition for Multi-Agent Reinforcement Learning

cs.LG · 2026-04-09 · unverdicted · novelty 7.0

CLOVER augments value decomposition with a GNN mixer whose weights depend on the realized wireless communication graph, proving permutation invariance, monotonicity, and greater expressiveness than QMIX while showing gains on Predator-Prey and Lumberjacks under p-CSMA channels.

Oscillators Are All You Need: Irregular Time Series Modelling via Damped Harmonic Oscillators with Closed-Form Solutions

cs.LG · 2026-02-12 · unverdicted · novelty 7.0

Damped harmonic oscillators with closed-form solutions model keys, values, and queries in continuous attention for irregular time series, preserving universal approximation while being orders of magnitude faster than prior NODE-based methods.

Mitigating Error Accumulation in Continuous Navigation via Memory-Augmented Kalman Filtering

cs.RO · 2026-01-30 · unverdicted · novelty 7.0

NeuroKalman mitigates state drift in vision-language UAV navigation by using memory-augmented Kalman filtering where attention retrieves historical anchors to correct predictions without gradient updates.

ExDoS: Expert-Guided Dual-Focus Cross-Modal Distillation for Smart Contract Vulnerability Detection

cs.CR · 2025-09-12 · unverdicted · novelty 7.0

ExDoS uses expert-guided dual-focus distillation between source semantic graphs and bytecode control-flow graphs plus a dual-attention network to improve smart contract vulnerability detection, reporting 3-6% F1 gains over baselines.

Unsupervised Learning of Local Updates for Maximum Independent Set in Dynamic Graphs

cs.LG · 2025-05-19 · unverdicted · novelty 7.0

Unsupervised GNN model learns local updates for approximate MaxIS on dynamic graphs, achieving competitive ratios on 200-1000 node instances and 1.00-1.18x larger solutions than other unsupervised models when generalizing to 100x larger graphs.

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

cs.LG · 2024-05-31 · unverdicted · novelty 7.0

Transformers and SSMs are unified through structured state space duality, producing a 2-8X faster Mamba-2 model that remains competitive with Transformers.

Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

cs.LG · 2024-02-29 · unverdicted · novelty 7.0

Griffin hybrid model matches Llama-2 performance while trained on over 6 times fewer tokens and offers lower inference latency with higher throughput.

ComHymba: Low-Complexity Domain-Informed Foundation Model for Wireless Communications

eess.SP · 2026-05-22 · unverdicted · novelty 6.0

ComHymba introduces a domain-informed wireless foundation model with Hymba blocks for linear-complexity CSI modeling, reporting accuracy gains on eight downstream tasks and up to 3.3x inference speedup over Transformers.

Learning Transferable Topology Priors for Multi-Agent LLM Collaboration Across Domains

cs.CL · 2026-05-17 · unverdicted · novelty 6.0

TopoPrior learns transferable topology priors offline from multi-domain reference graphs using a conditional variational graph model and adversarial adaptation to initialize collaboration structures for multi-agent LLM systems, reducing online search overhead.

What If We Let Forecasting Forget? A Sparse Bottleneck for Cross-Variable Dependencies

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

MS-FLOW uses a capacity-limited sparse routing mechanism to model only critical inter-variable dependencies in time series data, achieving state-of-the-art accuracy on 12 benchmarks with fewer but more reliable connections.

Object Referring-Guided Scanpath Prediction with Perception-Enhanced Vision-Language Models

cs.CV · 2026-04-22 · unverdicted · novelty 6.0

ScanVLA uses a vision-language model with a history-enhanced decoder and frozen segmentation LoRA to outperform prior methods on object-referring scanpath prediction.

UniDetect: LLM-Driven Universal Fraud Detection across Heterogeneous Blockchains

cs.CR · 2026-04-14 · unverdicted · novelty 6.0

UniDetect is an LLM-based system that generates universal transaction summary texts and uses two-stage multimodal training on text plus graphs to detect fraudulent accounts across heterogeneous blockchains, outperforming baselines by 5.57-7.58% KS and achieving over 94.58% zero-shot cross-chain and

Learning to Test: Physics-Informed Representation for Dynamical Instability Detection

cs.LG · 2026-04-13 · unverdicted · novelty 6.0

A physics-informed neural representation is learned from safe data to support distributional hypothesis testing for dynamical instability in stochastic DAE systems without repeated simulations.

RF-LEGO: Modularized Signal Processing-Deep Learning Co-Design for RF Sensing via Deep Unrolling

cs.DC · 2026-04-11 · unverdicted · novelty 6.0

RF-LEGO turns signal processing algorithms into trainable modular DL modules via deep unrolling, outperforming pure SP and DL baselines in RF sensing while preserving interpretability.

Behavior-Aware Item Modeling via Dynamic Procedural Solution Representations for Knowledge Tracing

cs.CL · 2026-04-09 · unverdicted · novelty 6.0

BAIM enriches knowledge tracing item representations by deriving stage-level embeddings from Polya's four problem-solving stages and routing them adaptively per learner context, yielding consistent gains over pretraining baselines on two datasets.

citing papers explorer

Showing 9 of 9 citing papers after filters.

CanViT: Toward Active-Vision Foundation Models cs.CV · 2026-03-23 · conditional · none · ref 36 · internal anchor
CanViT is the first task- and policy-agnostic AVFM pretrained via passive-to-active dense latent distillation on 13.2M scenes and 1B random glimpses, achieving 38.5% ADE20K mIoU in one glimpse and 84.5% ImageNet-1k top-1 after fine-tuning.
What-Where Transformer: A Slot-Centric Visual Backbone for Concurrent Representation and Localization cs.CV · 2026-05-12 · unverdicted · none · ref 13 · internal anchor
The What-Where Transformer achieves explicit what-where separation in a ViT-style backbone via concurrent token and attention-map streams, yielding emergent object discovery from attention maps and better weakly-supervised localization.
Object Referring-Guided Scanpath Prediction with Perception-Enhanced Vision-Language Models cs.CV · 2026-04-22 · unverdicted · none · ref 8 · internal anchor
ScanVLA uses a vision-language model with a history-enhanced decoder and frozen segmentation LoRA to outperform prior methods on object-referring scanpath prediction.
Improving Description-based Person Re-identification by Multi-granularity Image-text Alignments cs.CV · 2019-06-23 · unverdicted · none · ref 10 · internal anchor
The MIA model with GC, RGA, and BFM modules achieves state-of-the-art performance on the CUHK-PEDES dataset for description-based person re-identification.
HOI-aware Adaptive Network for Weakly-supervised Action Segmentation cs.CV · 2026-04-29 · unverdicted · none · ref 5 · internal anchor
AdaAct employs a HOI encoder and two-branch hypernetwork to adaptively adjust temporal encoding parameters based on video-level human-object interactions for improved weakly-supervised action segmentation.
From Time-series Generation, Model Selection to Transfer Learning: A Comparative Review of Pixel-wise Approaches for Large-scale Crop Mapping cs.CV · 2025-07-16 · unverdicted · none · ref 13 · internal anchor
A comparative review with experiments identifying optimal preprocessing, models, and transfer strategies for large-scale pixel-wise crop mapping using Landsat 8 data across five sites.
Two-stream Spatiotemporal Feature for Video QA Task cs.CV · 2019-07-11 · unverdicted · none · ref 14 · internal anchor
A two-stream spatiotemporal feature extractor with squeeze-and-excitation and attention-based context matching improves text-only video QA on TVQA but shows limitations with visual features.
Recent Advances in Multi-Agent Human Trajectory Prediction: A Comprehensive Review cs.CV · 2025-06-13 · unverdicted · none · ref 143 · internal anchor
A review categorizing 2020-2025 deep learning methods for multi-agent human trajectory prediction by architecture, input representations, and strategies, with emphasis on ETH/UCY benchmark evaluations and future challenges.
A Survey on Deep Learning Techniques for Action Anticipation cs.CV · 2023-09-29 · unverdicted · none · ref 146 · internal anchor
A literature survey reviewing deep learning approaches to action anticipation in everyday scenarios, with method classifications, dataset and metric summaries, and future directions.

Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

hub tools

citation-role summary

citation-polarity summary

claims ledger

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer