super hub Mixed citations

Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

Caglar Gulcehre, Junyoung Chung, Kyunghyun Cho, Yoshua Bengio · 2014 · cs.NE · arXiv 1412.3555

Mixed citation behavior. Most common role is background (62%).

125 Pith papers citing it

Background 62% of classified citations

open full Pith review browse 125 citing papers more from Caglar Gulcehre arXiv PDF

abstract

In this paper we compare different types of recurrent units in recurrent neural networks (RNNs). Especially, we focus on more sophisticated units that implement a gating mechanism, such as a long short-term memory (LSTM) unit and a recently proposed gated recurrent unit (GRU). We evaluate these recurrent units on the tasks of polyphonic music modeling and speech signal modeling. Our experiments revealed that these advanced recurrent units are indeed better than more traditional recurrent units such as tanh units. Also, we found GRU to be comparable to LSTM.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 8 method 3 baseline 1 other 1

citation-polarity summary

background 8 use method 3 baseline 1 unclear 1

claims ledger

abstract In this paper we compare different types of recurrent units in recurrent neural networks (RNNs). Especially, we focus on more sophisticated units that implement a gating mechanism, such as a long short-term memory (LSTM) unit and a recently proposed gated recurrent unit (GRU). We evaluate these recurrent units on the tasks of polyphonic music modeling and speech signal modeling. Our experiments revealed that these advanced recurrent units are indeed better than more traditional recurrent units such as tanh units. Also, we found GRU to be comparable to LSTM.

authors

Caglar Gulcehre Junyoung Chung Kyunghyun Cho Yoshua Bengio

co-cited works

representative citing papers

CanViT: Toward Active-Vision Foundation Models

cs.CV · 2026-03-23 · conditional · novelty 8.0

CanViT is the first task- and policy-agnostic AVFM pretrained via passive-to-active dense latent distillation on 13.2M scenes and 1B random glimpses, achieving 38.5% ADE20K mIoU in one glimpse and 84.5% ImageNet-1k top-1 after fine-tuning.

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

cs.LG · 2023-12-01 · unverdicted · novelty 8.0

Mamba is a linear-time sequence model using input-dependent selective SSMs that achieves SOTA results across modalities and matches twice-larger Transformers on language modeling with 5x higher inference throughput.

Identifying Latent Concepts and Structures for Generalized Category Discovery

cs.CV · 2026-07-01 · unverdicted · novelty 7.0

CPF-GCD enforces low-rank compositional structure on vision backbone features via spatial primitive fields so that novel categories emerge as new activation patterns over a shared vocabulary of reusable visual primitives.

LongSpike: Fractional Order Spiking State Space Models for Efficient Long Sequence Learning

cs.LG · 2026-06-11 · unverdicted · novelty 7.0

LongSpike integrates fractional-order state-space modeling into spiking neural networks, enabling better long-sequence performance than prior SNNs on LRA, WikiText-103, and Speech Commands benchmarks while retaining sparse computation.

CoMetaPNS: Continually Meta-learning Personalized Neural Surrogates for Cardiac Electrophysiology Simulations

cs.LG · 2026-06-05 · unverdicted · novelty 7.0

CoMetaPNS combines meta-learned neural surrogates with a continual Bayesian Gaussian Mixture Model to adapt cardiac electrophysiology simulations to new data while avoiding catastrophic forgetting.

AdaState: Self-Evolving Anchors for Streaming Video Generation

cs.CV · 2026-05-28 · unverdicted · novelty 7.0

AdaState replaces the static first-frame KV anchor with an evolving hidden latent that the model denoises alongside content, treating time as relative to enable recurrence and richer dynamics in streaming video generation.

LC-Flow: Learning Local Continuous Optical Flow and Confidence from events

cs.CV · 2026-05-23 · unverdicted · novelty 7.0

LC-Flow introduces a continuous local recurrent network for learning sparse optical flow and confidence directly from event streams, with confidence-guided aggregation reaching new SOTA on MVSEC.

Nested-GPT for variable-multiplicity parton showers: A case study in the resummation of non-global logarithms

hep-ph · 2026-05-18 · unverdicted · novelty 7.0 · 2 refs

Nested-GPT is an autoregressive Transformer surrogate that generates variable-multiplicity parton showers while enforcing ordered Markovian branching and matches reference Monte Carlo results for leading-log non-global logarithm resummation in the large-Nc limit.

Identify Then Project: Contrastive Learning of Latent Dynamics from Partial Observations with Port-Hamiltonian Structure

cs.LG · 2026-05-15 · unverdicted · novelty 7.0

A two-stage contrastive teacher-student framework learns and then projects latent dynamics onto port-Hamiltonian submanifolds from partial observations.

TokAlign++: Advancing Vocabulary Adaptation via Better Token Alignment

cs.CL · 2026-05-13 · unverdicted · novelty 7.0

TokAlign++ learns token alignments between LLM vocabularies from monolingual representations to enable faster adaptation, better text compression, and effective token-level distillation across 15 languages with minimal steps.

Vector-Quantized Discrete Latent Factors Meet Financial Priors: Dynamic Cross-Sectional Stock Ranking Prediction for Portfolio Construction

cs.LG · 2026-05-13 · conditional · novelty 7.0

PRISM-VQ integrates vector-quantized latent factors with financial priors and a structure-conditioned mixture-of-experts to deliver improved cross-sectional stock return predictions and portfolio performance on CSI 300 and S&P 500.

What-Where Transformer: A Slot-Centric Visual Backbone for Concurrent Representation and Localization

cs.CV · 2026-05-12 · unverdicted · novelty 7.0

The What-Where Transformer achieves explicit what-where separation in a ViT-style backbone via concurrent token and attention-map streams, yielding emergent object discovery from attention maps and better weakly-supervised localization.

TailedTS: Benchmark Dataset for Heavy-Tailed Time Series Prediction and Periodicity Quantification

cs.LG · 2026-05-09 · unverdicted · novelty 7.0

TailedTS supplies 24.69 billion Wikipedia page-view records as a public benchmark for heavy-tailed time series forecasting and periodicity analysis, revealing weaker periodic structure in high-traffic pages.

TCRTransBench: A Comprehensive Benchmark for Bidirectional TCR-Peptide Sequence Generation

q-bio.CB · 2026-05-06 · unverdicted · novelty 7.0

TCRTransBench provides a new benchmark with bidirectional TCR-peptide generation tasks, a large validated dataset, and metrics to evaluate neural models for immunological sequence modeling.

Learning to Theorize the World from Observation

cs.LG · 2026-05-05 · unverdicted · novelty 7.0

NEO is a probabilistic neural model that induces compositional programs as a learned Language of Thought from non-textual observations and executes them via a shared transition model to enable explanation-driven generalization.

Wireless Communication Enhanced Value Decomposition for Multi-Agent Reinforcement Learning

cs.LG · 2026-04-09 · unverdicted · novelty 7.0

CLOVER augments value decomposition with a GNN mixer whose weights depend on the realized wireless communication graph, proving permutation invariance, monotonicity, and greater expressiveness than QMIX while showing gains on Predator-Prey and Lumberjacks under p-CSMA channels.

Oscillators Are All You Need: Irregular Time Series Modelling via Damped Harmonic Oscillators with Closed-Form Solutions

cs.LG · 2026-02-12 · unverdicted · novelty 7.0

Damped harmonic oscillators with closed-form solutions model keys, values, and queries in continuous attention for irregular time series, preserving universal approximation while being orders of magnitude faster than prior NODE-based methods.

Mitigating Error Accumulation in Continuous Navigation via Memory-Augmented Kalman Filtering

cs.RO · 2026-01-30 · unverdicted · novelty 7.0

NeuroKalman mitigates state drift in vision-language UAV navigation by using memory-augmented Kalman filtering where attention retrieves historical anchors to correct predictions without gradient updates.

ExDoS: Expert-Guided Dual-Focus Cross-Modal Distillation for Smart Contract Vulnerability Detection

cs.CR · 2025-09-12 · unverdicted · novelty 7.0

ExDoS uses expert-guided dual-focus distillation between source semantic graphs and bytecode control-flow graphs plus a dual-attention network to improve smart contract vulnerability detection, reporting 3-6% F1 gains over baselines.

Unsupervised Learning of Local Updates for Maximum Independent Set in Dynamic Graphs

cs.LG · 2025-05-19 · unverdicted · novelty 7.0

Unsupervised GNN model learns local updates for approximate MaxIS on dynamic graphs, achieving competitive ratios on 200-1000 node instances and 1.00-1.18x larger solutions than other unsupervised models when generalizing to 100x larger graphs.

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

cs.LG · 2024-05-31 · unverdicted · novelty 7.0

Transformers and SSMs are unified through structured state space duality, producing a 2-8X faster Mamba-2 model that remains competitive with Transformers.

Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

cs.LG · 2024-02-29 · unverdicted · novelty 7.0

Griffin hybrid model matches Llama-2 performance while trained on over 6 times fewer tokens and offers lower inference latency with higher throughput.

Estimation--Prediction Tradeoff in Causal Probabilistic Temporal Graphs

cs.LG · 2026-06-26 · unverdicted · novelty 6.0

Characterizes an estimation-prediction tradeoff in binary logistic models for causal probabilistic temporal graphs and proposes a framework to jointly evaluate temporal link prediction with causal parameter recovery via Cramér-Rao bounds.

Temporal-Aware Reasoning Optimization for Video Temporal Grounding

cs.CV · 2026-06-08 · unverdicted · novelty 6.0

TaRO improves video temporal grounding in MLLMs via constructive reasoning exploration from dense captions and a temporal-sensitivity reward that uses logit drops on disrupted event boundaries, followed by curriculum learning to SOTA results.

citing papers explorer

Showing 50 of 125 citing papers.

CanViT: Toward Active-Vision Foundation Models cs.CV · 2026-03-23 · conditional · none · ref 36 · internal anchor
CanViT is the first task- and policy-agnostic AVFM pretrained via passive-to-active dense latent distillation on 13.2M scenes and 1B random glimpses, achieving 38.5% ADE20K mIoU in one glimpse and 84.5% ImageNet-1k top-1 after fine-tuning.
Mamba: Linear-Time Sequence Modeling with Selective State Spaces cs.LG · 2023-12-01 · unverdicted · none · ref 17 · internal anchor
Mamba is a linear-time sequence model using input-dependent selective SSMs that achieves SOTA results across modalities and matches twice-larger Transformers on language modeling with 5x higher inference throughput.
Identifying Latent Concepts and Structures for Generalized Category Discovery cs.CV · 2026-07-01 · unverdicted · none · ref 121 · internal anchor
CPF-GCD enforces low-rank compositional structure on vision backbone features via spatial primitive fields so that novel categories emerge as new activation patterns over a shared vocabulary of reusable visual primitives.
LongSpike: Fractional Order Spiking State Space Models for Efficient Long Sequence Learning cs.LG · 2026-06-11 · unverdicted · none · ref 4 · internal anchor
LongSpike integrates fractional-order state-space modeling into spiking neural networks, enabling better long-sequence performance than prior SNNs on LRA, WikiText-103, and Speech Commands benchmarks while retaining sparse computation.
CoMetaPNS: Continually Meta-learning Personalized Neural Surrogates for Cardiac Electrophysiology Simulations cs.LG · 2026-06-05 · unverdicted · none · ref 39 · internal anchor
CoMetaPNS combines meta-learned neural surrogates with a continual Bayesian Gaussian Mixture Model to adapt cardiac electrophysiology simulations to new data while avoiding catastrophic forgetting.
AdaState: Self-Evolving Anchors for Streaming Video Generation cs.CV · 2026-05-28 · unverdicted · none · ref 3 · internal anchor
AdaState replaces the static first-frame KV anchor with an evolving hidden latent that the model denoises alongside content, treating time as relative to enable recurrence and richer dynamics in streaming video generation.
LC-Flow: Learning Local Continuous Optical Flow and Confidence from events cs.CV · 2026-05-23 · unverdicted · none · ref 7 · internal anchor
LC-Flow introduces a continuous local recurrent network for learning sparse optical flow and confidence directly from event streams, with confidence-guided aggregation reaching new SOTA on MVSEC.
Nested-GPT for variable-multiplicity parton showers: A case study in the resummation of non-global logarithms hep-ph · 2026-05-18 · unverdicted · none · ref 71 · 2 links · internal anchor
Nested-GPT is an autoregressive Transformer surrogate that generates variable-multiplicity parton showers while enforcing ordered Markovian branching and matches reference Monte Carlo results for leading-log non-global logarithm resummation in the large-Nc limit.
Identify Then Project: Contrastive Learning of Latent Dynamics from Partial Observations with Port-Hamiltonian Structure cs.LG · 2026-05-15 · unverdicted · none · ref 4 · internal anchor
A two-stage contrastive teacher-student framework learns and then projects latent dynamics onto port-Hamiltonian submanifolds from partial observations.
TokAlign++: Advancing Vocabulary Adaptation via Better Token Alignment cs.CL · 2026-05-13 · unverdicted · none · ref 89 · internal anchor
TokAlign++ learns token alignments between LLM vocabularies from monolingual representations to enable faster adaptation, better text compression, and effective token-level distillation across 15 languages with minimal steps.
Vector-Quantized Discrete Latent Factors Meet Financial Priors: Dynamic Cross-Sectional Stock Ranking Prediction for Portfolio Construction cs.LG · 2026-05-13 · conditional · none · ref 6 · internal anchor
PRISM-VQ integrates vector-quantized latent factors with financial priors and a structure-conditioned mixture-of-experts to deliver improved cross-sectional stock return predictions and portfolio performance on CSI 300 and S&P 500.
What-Where Transformer: A Slot-Centric Visual Backbone for Concurrent Representation and Localization cs.CV · 2026-05-12 · unverdicted · none · ref 13 · internal anchor
The What-Where Transformer achieves explicit what-where separation in a ViT-style backbone via concurrent token and attention-map streams, yielding emergent object discovery from attention maps and better weakly-supervised localization.
TailedTS: Benchmark Dataset for Heavy-Tailed Time Series Prediction and Periodicity Quantification cs.LG · 2026-05-09 · unverdicted · none · ref 15 · internal anchor
TailedTS supplies 24.69 billion Wikipedia page-view records as a public benchmark for heavy-tailed time series forecasting and periodicity analysis, revealing weaker periodic structure in high-traffic pages.
TCRTransBench: A Comprehensive Benchmark for Bidirectional TCR-Peptide Sequence Generation q-bio.CB · 2026-05-06 · unverdicted · none · ref 14 · internal anchor
TCRTransBench provides a new benchmark with bidirectional TCR-peptide generation tasks, a large validated dataset, and metrics to evaluate neural models for immunological sequence modeling.
Learning to Theorize the World from Observation cs.LG · 2026-05-05 · unverdicted · none · ref 85 · internal anchor
NEO is a probabilistic neural model that induces compositional programs as a learned Language of Thought from non-textual observations and executes them via a shared transition model to enable explanation-driven generalization.
Wireless Communication Enhanced Value Decomposition for Multi-Agent Reinforcement Learning cs.LG · 2026-04-09 · unverdicted · none · ref 46 · internal anchor
CLOVER augments value decomposition with a GNN mixer whose weights depend on the realized wireless communication graph, proving permutation invariance, monotonicity, and greater expressiveness than QMIX while showing gains on Predator-Prey and Lumberjacks under p-CSMA channels.
Oscillators Are All You Need: Irregular Time Series Modelling via Damped Harmonic Oscillators with Closed-Form Solutions cs.LG · 2026-02-12 · unverdicted · none · ref 1 · internal anchor
Damped harmonic oscillators with closed-form solutions model keys, values, and queries in continuous attention for irregular time series, preserving universal approximation while being orders of magnitude faster than prior NODE-based methods.
Mitigating Error Accumulation in Continuous Navigation via Memory-Augmented Kalman Filtering cs.RO · 2026-01-30 · unverdicted · none · ref 4 · internal anchor
NeuroKalman mitigates state drift in vision-language UAV navigation by using memory-augmented Kalman filtering where attention retrieves historical anchors to correct predictions without gradient updates.
ExDoS: Expert-Guided Dual-Focus Cross-Modal Distillation for Smart Contract Vulnerability Detection cs.CR · 2025-09-12 · unverdicted · none · ref 63 · internal anchor
ExDoS uses expert-guided dual-focus distillation between source semantic graphs and bytecode control-flow graphs plus a dual-attention network to improve smart contract vulnerability detection, reporting 3-6% F1 gains over baselines.
Unsupervised Learning of Local Updates for Maximum Independent Set in Dynamic Graphs cs.LG · 2025-05-19 · unverdicted · none · ref 15 · internal anchor
Unsupervised GNN model learns local updates for approximate MaxIS on dynamic graphs, achieving competitive ratios on 200-1000 node instances and 1.00-1.18x larger solutions than other unsupervised models when generalizing to 100x larger graphs.
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality cs.LG · 2024-05-31 · unverdicted · none · ref 21 · internal anchor
Transformers and SSMs are unified through structured state space duality, producing a 2-8X faster Mamba-2 model that remains competitive with Transformers.
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models cs.LG · 2024-02-29 · unverdicted · none · ref 7 · internal anchor
Griffin hybrid model matches Llama-2 performance while trained on over 6 times fewer tokens and offers lower inference latency with higher throughput.
Estimation--Prediction Tradeoff in Causal Probabilistic Temporal Graphs cs.LG · 2026-06-26 · unverdicted · none · ref 223 · internal anchor
Characterizes an estimation-prediction tradeoff in binary logistic models for causal probabilistic temporal graphs and proposes a framework to jointly evaluate temporal link prediction with causal parameter recovery via Cramér-Rao bounds.
Temporal-Aware Reasoning Optimization for Video Temporal Grounding cs.CV · 2026-06-08 · unverdicted · none · ref 72 · internal anchor
TaRO improves video temporal grounding in MLLMs via constructive reasoning exploration from dense captions and a temporal-sensitivity reward that uses logit drops on disrupted event boundaries, followed by curriculum learning to SOTA results.
Larch: Learned Query Optimization for Semantic Predicates cs.DB · 2026-06-06 · unverdicted · none · ref 11 · internal anchor
Larch uses a GNN-MDP formulation and a selectivity predictor plus dynamic programming to reorder semantic filter evaluation, cutting token usage 3x-19x versus prior systems on real and synthetic workloads.
Pretraining Recurrent Networks without Recurrence cs.LG · 2026-06-04 · unverdicted · none · ref 18 · internal anchor
SMT reduces RNN training to supervised learning on memory transitions (m_t, x_{t+1}) to m_{t+1} obtained from a Transformer encoder, enabling time-parallel training with O(1) gradient paths.
Generating Financial Time Series by Matching Random Convolutional Features cs.LG · 2026-06-03 · unverdicted · none · ref 93 · internal anchor
Introduces SOCK (SOft Competing Kernels), a differentiable random convolutional feature map, to train generative models of financial time series via feature matching and shows outperformance over signature and diffusion baselines on small-sample datasets.
ReSGA: A Large Tail Risk Model for Learning Value-at-Risk and Expected Shortfall stat.ML · 2026-06-03 · unverdicted · none · ref 12 · internal anchor
ReSGA, a large autoencoder, outperforms prior methods on joint VaR-ES forecasting for US equities and converts the edge into economic gains via a size-enhanced momentum strategy, with gains attributed to data complexity.
Physically-Constrained Mamba-SDE for Remaining Useful Life Prediction under Irregular Observations cs.AI · 2026-06-01 · unverdicted · none · ref 9 · internal anchor
PC-MambaSDE combines Mamba with physics-constrained SDE for RUL prediction under irregular observations, with theoretical stability guarantees and empirical outperformance on benchmarks.
Hide-and-Seek in Trajectories: Discovering Failure Signals for VLA Runtime Monitoring cs.RO · 2026-05-29 · unverdicted · none · ref 64 · internal anchor
Hide-and-Seek uses contrastive objectives on trajectories to localize failure signals in VLA models from trajectory-level supervision alone.
IP-Adapter Is All You Need: Towards Fine-Tuning-Free Diffusion-Based Talking Face Generation cs.CV · 2026-05-28 · unverdicted · none · ref 6 · internal anchor
A fine-tuning-free framework combines pretrained Stable Diffusion with IP-Adapter plus three parameter-free modules to achieve improved lip synchronization and visual quality in talking face generation.
PHGNet: Prototype-Guided Hypergraph Construction for Heterogeneous Spatiotemporal Forecasting cs.AI · 2026-05-25 · unverdicted · none · ref 4 · internal anchor
PHGNet proposes prototype-guided hypergraph construction plus global-local representations and temporal attention to model high-order spatiotemporal dependencies in traffic data and reports better performance than prior methods on real datasets.
ComHymba: Low-Complexity Domain-Informed Foundation Model for Wireless Communications eess.SP · 2026-05-22 · unverdicted · none · ref 31 · internal anchor
ComHymba introduces a domain-informed wireless foundation model with Hymba blocks for linear-complexity CSI modeling, reporting accuracy gains on eight downstream tasks and up to 3.3x inference speedup over Transformers.
Spectra as Language: Large Language Models for Scalable Stellar Parameter and Abundance Inference astro-ph.IM · 2026-05-21 · unverdicted · none · ref 6 · 3 links · internal anchor
Two-stage LLM framework infers stellar parameters and ~20 elemental abundances from spectra, showing performance gains with increasing data volume.
Learning Transferable Topology Priors for Multi-Agent LLM Collaboration Across Domains cs.CL · 2026-05-17 · unverdicted · none · ref 56 · internal anchor
TopoPrior learns transferable topology priors offline from multi-domain reference graphs using a conditional variational graph model and adversarial adaptation to initialize collaboration structures for multi-agent LLM systems, reducing online search overhead.
What If We Let Forecasting Forget? A Sparse Bottleneck for Cross-Variable Dependencies cs.LG · 2026-05-08 · unverdicted · none · ref 97 · internal anchor
MS-FLOW uses a capacity-limited sparse routing mechanism to model only critical inter-variable dependencies in time series data, achieving state-of-the-art accuracy on 12 benchmarks with fewer but more reliable connections.
DexSynRefine: Synthesizing and Refining Human-Object Interaction Motion for Physically Feasible Dexterous Robot Actions cs.RO · 2026-05-07 · unverdicted · none · ref 26 · 2 links · internal anchor
DexSynRefine couples HOI motion manifold flow primitives with task-space residual RL and proprioceptive adaptation to convert human-object interaction data into executable dexterous robot motions, reporting 50-70 point real-world success rate gains over kinematic retargeting on five tasks.
Object Referring-Guided Scanpath Prediction with Perception-Enhanced Vision-Language Models cs.CV · 2026-04-22 · unverdicted · none · ref 8 · internal anchor
ScanVLA uses a vision-language model with a history-enhanced decoder and frozen segmentation LoRA to outperform prior methods on object-referring scanpath prediction.
UniDetect: LLM-Driven Universal Fraud Detection across Heterogeneous Blockchains cs.CR · 2026-04-14 · unverdicted · none · ref 2 · internal anchor
UniDetect is an LLM-based system that generates universal transaction summary texts and uses two-stage multimodal training on text plus graphs to detect fraudulent accounts across heterogeneous blockchains, outperforming baselines by 5.57-7.58% KS and achieving over 94.58% zero-shot cross-chain and
Learning to Test: Physics-Informed Representation for Dynamical Instability Detection cs.LG · 2026-04-13 · unverdicted · none · ref 16 · internal anchor
A physics-informed neural representation is learned from safe data to support distributional hypothesis testing for dynamical instability in stochastic DAE systems without repeated simulations.
RF-LEGO: Modularized Signal Processing-Deep Learning Co-Design for RF Sensing via Deep Unrolling cs.DC · 2026-04-11 · unverdicted · none · ref 12 · internal anchor
RF-LEGO turns signal processing algorithms into trainable modular DL modules via deep unrolling, outperforming pure SP and DL baselines in RF sensing while preserving interpretability.
Behavior-Aware Item Modeling via Dynamic Procedural Solution Representations for Knowledge Tracing cs.CL · 2026-04-09 · unverdicted · none · ref 2 · internal anchor
BAIM enriches knowledge tracing item representations by deriving stage-level embeddings from Polya's four problem-solving stages and routing them adaptively per learner context, yielding consistent gains over pretraining baselines on two datasets.
CWRNN-INVR: A Coupled WarpRNN based Implicit Neural Video Representation eess.IV · 2026-04-08 · unverdicted · none · ref 39 · internal anchor
CWRNN-INVR combines WarpRNN for structured video information and residual grids for irregular details to reach 33.73 dB average PSNR on the UVG dataset at 3M parameters, outperforming existing INVR methods.
M$^2$RNN: Non-Linear RNNs with Matrix-Valued States for Scalable Language Modeling cs.LG · 2026-03-15 · unverdicted · none · ref 5 · internal anchor
M²RNN achieves perfect state tracking at unseen lengths and outperforms Gated DeltaNet hybrids by 0.4-0.5 perplexity on 7B models with 3x smaller recurrent states.
Uniform Inductive Spatio-Temporal Kriging cs.AI · 2026-03-05 · unverdicted · none · ref 4 · internal anchor
UniSTOK improves inductive spatio-temporal kriging under incomplete observations by reliability-guided signal regulation and residual bias calibration.
SCASRec: A Self-Correcting and Auto-Stopping Model for Generative Route List Recommendation cs.IR · 2026-02-03 · unverdicted · none · ref 10 · internal anchor
SCASRec unifies ranking and redundancy elimination for route lists via stepwise corrective rewards and an adaptive end-of-recommendation token, claiming SOTA results on two datasets and real deployment.
Short window attention enables long-term memorization cs.LG · 2025-09-29 · unverdicted · none · ref 7 · internal anchor
Short sliding windows in hybrid attention-xLSTM models boost long-context performance by encouraging long-term memory use, and stochastic window sizing improves both short and long tasks.
Flow marching for a generative PDE foundation model cs.LG · 2025-09-23 · unverdicted · none · ref 15 · internal anchor
Flow Marching jointly samples noise and physical time to learn a velocity field for generative PDE modeling, paired with a latent autoencoder and efficient transformer for large-scale pretraining on 2.5M trajectories.
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention cs.CL · 2025-06-16 · unverdicted · none · ref 5 · internal anchor
MiniMax-M1 is a 456B parameter hybrid-attention MoE model trained with CISPO RL that achieves performance comparable or superior to DeepSeek-R1 and Qwen3-235B on reasoning and software engineering tasks while training in three weeks on 512 GPUs.
Logo-LLM: Local and Global Modeling with Large Language Models for Time Series Forecasting cs.LG · 2025-05-16 · unverdicted · none · ref 2 · internal anchor
Logo-LLM improves time series forecasting by pulling local dynamics from shallow LLM layers and global trends from deeper layers, then aligning them via new Local-Mixer and Global-Mixer modules.

Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer