super hub Mixed citations

Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation

Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, Kyunghyun Cho · 2014 · cs.CL · arXiv 1406.1078

Mixed citation behavior. Most common role is background (47%).

115 Pith papers citing it

Background 47% of classified citations

open full Pith review browse 115 citing papers more from Bart van Merrienboer arXiv PDF

abstract

In this paper, we propose a novel neural network model called RNN Encoder-Decoder that consists of two recurrent neural networks (RNN). One RNN encodes a sequence of symbols into a fixed-length vector representation, and the other decodes the representation into another sequence of symbols. The encoder and decoder of the proposed model are jointly trained to maximize the conditional probability of a target sequence given a source sequence. The performance of a statistical machine translation system is empirically found to improve by using the conditional probabilities of phrase pairs computed by the RNN Encoder-Decoder as an additional feature in the existing log-linear model. Qualitatively, we show that the proposed model learns a semantically and syntactically meaningful representation of linguistic phrases.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 8 method 5 baseline 1 dataset 1

citation-polarity summary

background 7 use method 5 baseline 1 unclear 1 use dataset 1

claims ledger

abstract In this paper, we propose a novel neural network model called RNN Encoder-Decoder that consists of two recurrent neural networks (RNN). One RNN encodes a sequence of symbols into a fixed-length vector representation, and the other decodes the representation into another sequence of symbols. The encoder and decoder of the proposed model are jointly trained to maximize the conditional probability of a target sequence given a source sequence. The performance of a statistical machine translation system is empirically found to improve by using the conditional probabilities of phrase pairs computed
method To validate whether the quality advantage of GS scenes translates to stronger navigation agents, we train five agent groups under differ- ent scene-domain mixtures, with training budget fixed at5×107 steps:A: 100 mesh scenes,B: 100 GS scenes,C: 80M + 20G,D: 50M + 50G, andE: 20M + 80G. All agents share a unified DD-PPO [30] architecture with a ResNet [6] visual encoder and a GRU [4] policy head, receiving256×256RGB and depth observations, with only training scene composition varying. Each agent i
method Recent works [54,62] have demon- strated that incorporating G-buffers and temporal cues can significantly improve reconstruction quality. Motivated by this, we propose a Geometry-Temporal Re- current Refinement Network to further enhance the interpolated outputs. As illustrated in Fig. 3 (a), the proposed network employs a modified gated recurrent unit (GRU) [9] architecture. The network takes as input the current 8 Y. Zhao et al. low-resolution frameIand its gradientsG, the depth mapD, the norm
method and we map the stateH t to queries, keys and values with afﬁne projections using learned parameter matrices W Q∈ Rd×d/k, W K∈ Rd×d/k, W V∈ Rd×d/k and W O∈ Rd×d. At step t, the UT then computes revised representationsH t∈ Rm×d for all m input positions as follows H t = LAYERNORM(At+TRANSITION (At)) (4) where At = LAYERNORM((H t−1+P t)+ MULTIHEAD SELFATTENTION (H t−1+P t)), (5) where LAYERNORM () is deﬁned in Ba et al. (2016), and TRANSITION () and P t are discussed below. Depending on the task, w
dataset and the kinematics are deterministic; digits overlap without collision, and bouncing reflections are non-linear. ( 2)PhyWorld Collision 30K[Kang et al., 2025] comprises roughly 30 thousand simulated rigid-body collisions with varying ball radii and velocities. It tests OOD generalisation and strict physical consistency (momentum and kinetic energy conservation). (3)WeatherBench (2m temperature)[Rasp et al., 2020] is a processed version of the ERA5 archive [Hersbach et al., 2020], containing glob
background Middle: attention-based learning enables adaptive and reliability-aware aggregation of heterogeneous measurements. Right: representative applications, including radio map reconstruction, LEO satellite localization, and map-informed resource allocation. To address these challenges, attention mechanisms have emerged as an effective framework for adaptive information aggregation. Originally developed for neural machine trans- lation [29], [30] and later generalized by Transformers [31], attention c
method θm (zt, at)→ˆz t+1) predicts the next latent state after a primitive action, and the decoder (Decθd (ˆzt+1)→ˆs t+1) reconstructs next-state information from the predicted latent. The pur- pose is not to use this model as a long-horizon planner, but to learn a dynamics-aware latent space where behaviorally similar states can merge and useful hubs can emerge. We use a GRU [9] memory module with clipped history so the latent can capture recent context without memorizing full trajectory identity. Ad

authors

Bart van Merrienboer Caglar Gulcehre Dzmitry Bahdanau Fethi Bougares Holger Schwenk Kyunghyun Cho

co-cited works

representative citing papers

PathVQA: 30000+ Questions for Medical Visual Question Answering

cs.CL · 2020-03-07 · accept · novelty 8.0

PathVQA is the first public dataset of over 32,000 questions on nearly 5,000 pathology images for medical visual question answering.

Identifying Latent Concepts and Structures for Generalized Category Discovery

cs.CV · 2026-07-01 · unverdicted · novelty 7.0

CPF-GCD enforces low-rank compositional structure on vision backbone features via spatial primitive fields so that novel categories emerge as new activation patterns over a shared vocabulary of reusable visual primitives.

An Algebraic View of the Expressivity of Recurrent Language Models

cs.FL · 2026-06-01 · unverdicted · novelty 7.0

A unified algebraic account reduces RNN expressivity to syntactic monoid division in wreath products and shows diagonal state-space models realize every even-modulus counter under unsigned-integer quantization but none under floating-point recurrences.

Learned Memory Attenuation in Sage-Husa Kalman Filters for Robust UAV State Estimation

eess.SP · 2026-05-18 · unverdicted · novelty 7.0

NDR-SHKF replaces the static forgetting factor in Sage-Husa Kalman Filters with a learned vector-valued memory attenuation policy from a bifurcated recurrent network trained end-to-end on whitened innovations to minimize estimation error.

Nested-GPT for variable-multiplicity parton showers: A case study in the resummation of non-global logarithms

hep-ph · 2026-05-18 · unverdicted · novelty 7.0 · 2 refs

Nested-GPT is an autoregressive Transformer surrogate that generates variable-multiplicity parton showers while enforcing ordered Markovian branching and matches reference Monte Carlo results for leading-log non-global logarithm resummation in the large-Nc limit.

PluRule: A Benchmark for Moderating Pluralistic Communities on Social Media

cs.CL · 2026-05-16 · unverdicted · novelty 7.0

PluRule is a new multimodal multilingual benchmark showing that state-of-the-art vision-language models perform only marginally better than a trivial baseline at detecting specific rule violations in pluralistic online communities.

Parallel Scan Recurrent Neural Quantum States for Scalable Variational Monte Carlo

cond-mat.str-el · 2026-05-13 · conditional · novelty 7.0

PSR-NQS makes recurrent neural quantum states scalable for variational Monte Carlo by using parallel scan recurrence, reaching accurate results on 52x52 two-dimensional lattices.

Zero-shot Imitation Learning by Latent Topology Mapping

cs.LG · 2026-05-08 · unverdicted · novelty 7.0

ZALT learns latent hub states and hub-to-hub dynamics from demonstrations to plan zero-shot solutions for unseen start-goal tasks, achieving 55% success in a 3D maze versus 6% for baselines.

Render, Don't Decode: Weight-Space World Models with Latent Structural Disentanglement

cs.CV · 2026-05-07 · unverdicted · novelty 7.0 · 2 refs

NOVA represents world states as INR weights for decoder-free rendering, compactness, and unsupervised disentanglement of background, foreground, and motion in video world models.

How Long Does Infinite Width Last? Signal Propagation in Long-Range Linear Recurrences

cs.LG · 2026-05-06 · unverdicted · novelty 7.0

In linear recurrent models, infinite-width signal propagation remains accurate only for depths t much smaller than sqrt(width n), with a critical regime at t ~ c sqrt(n) where finite-width effects emerge and dominate for larger t.

FLUID: Continuous-Time Hyperconnected Sparse Transformer for Sink-Free Learning

cs.LG · 2026-05-06 · unverdicted · novelty 7.0

FLUID is a continuous-time transformer using Liquid Attention Networks to model attention as stable ODE solutions that interpolate between discrete SDPA and CT-RNNs, with an explicit sink gate and liquid hyper-connections for better information flow.

Geometry-Induced Long-Range Correlations in Recurrent Neural Network Quantum States

quant-ph · 2026-04-09 · conditional · novelty 7.0

Dilated RNN wave functions induce power-law correlations for the critical 1D transverse-field Ising model and the Cluster state, unlike the exponential decay of conventional RNN ansatze.

A Clinical Point Cloud Paradigm for In-Hospital Mortality Prediction from Multi-Level Incomplete Multimodal EHRs

cs.LG · 2026-04-06 · unverdicted · novelty 7.0

HealthPoint represents clinical events as points in a 4D space (content, time, modality, case) and applies low-rank relational attention to achieve state-of-the-art mortality prediction from multi-level incomplete multimodal EHRs.

Denoising Particle Filters: Learning State Estimation with Single-Step Objectives

cs.RO · 2026-02-23 · conditional · novelty 7.0

Denoising particle filters train state estimators on individual transitions via score matching, then use the learned denoiser with a dynamics model to approximate Bayesian filtering step-by-step, matching end-to-end baselines while preserving composability.

MELT: A Behavioral Trace Dataset for High-Risk Memecoin Launch Detection

cs.CR · 2026-02-13 · unverdicted · novelty 7.0

MELT is the first behavioral trace dataset for high-risk memecoin launch detection on Solana, providing 122 features, risk annotations, and ML benchmarks that reduce investment loss when used for selection.

Cognitive Alpha Mining via LLM-Driven Code-Based Evolution

cs.CL · 2025-11-24 · unverdicted · novelty 7.0

CogAlpha combines LLM reasoning with code-level evolutionary search to discover financial alphas that show higher predictive accuracy and generalization than prior methods on five stock datasets.

Mastering Diverse Domains through World Models

cs.AI · 2023-01-10 · unverdicted · novelty 7.0

DreamerV3 uses world models and robustness techniques to solve over 150 tasks across domains with a single configuration, including Minecraft diamond collection from scratch.

Human Motion Diffusion Model

cs.CV · 2022-09-29 · unverdicted · novelty 7.0

MDM is a classifier-free diffusion model that generates expressive human motions by predicting clean samples rather than noise, supporting text and action conditioning and outperforming prior methods on standard benchmarks.

Mastering Atari with Discrete World Models

cs.LG · 2020-10-05 · accept · novelty 7.0

DreamerV2 reaches human-level performance on 55 Atari games by learning behaviors inside a separately trained discrete-latent world model.

Brno Mobile OCR Dataset

cs.CV · 2019-07-02 · accept · novelty 7.0

Introduces B-MOD dataset of 19,728 mobile device photos of documents with precise text line annotations and a neural baseline showing high error rates on harder images.

Graph Attention Networks

stat.ML · 2017-10-30 · accept · novelty 7.0

Graph Attention Networks compute learnable attention coefficients over node neighborhoods to produce weighted feature aggregations, achieving state-of-the-art results on citation networks and inductive protein-protein interaction graphs.

Mixed Precision Training

cs.AI · 2017-10-10 · accept · novelty 7.0

Mixed precision training uses FP16 for most computations, FP32 master weights for accumulation, and loss scaling to enable accurate training of large DNNs with halved memory usage.

Relevance Is Not Permission: Warranted Attention for Value Contributions

cs.AI · 2026-06-29 · unverdicted · novelty 6.0 · 2 refs

Warrant adds a query-item permission gate g_ij to attention value terms, improving primary metrics in 27 of 32 comparisons across CTDG, MTPP, RAG, STPP, and TKG tasks.

MoCo-AIS: A Contrastive Learning Framework for Similarity Computation of Vessel Trajectories

cs.AI · 2026-06-16 · unverdicted · novelty 6.0

MoCo-AIS is a MoCo-based contrastive learning framework that learns vessel trajectory embeddings and improves similarity computation over baselines on large-scale real-world AIS datasets while offering a benchmarking platform.

citing papers explorer

Showing 5 of 5 citing papers after filters.

Parallel Scan Recurrent Neural Quantum States for Scalable Variational Monte Carlo cond-mat.str-el · 2026-05-13 · conditional · none · ref 40 · internal anchor
PSR-NQS makes recurrent neural quantum states scalable for variational Monte Carlo by using parallel scan recurrence, reaching accurate results on 52x52 two-dimensional lattices.
Zero-shot Imitation Learning by Latent Topology Mapping cs.LG · 2026-05-08 · unverdicted · none · ref 9 · internal anchor
ZALT learns latent hub states and hub-to-hub dynamics from demonstrations to plan zero-shot solutions for unseen start-goal tasks, achieving 55% success in a 3D maze versus 6% for baselines.
3DGS$^3$: Joint Super Sampling and Frame Interpolation for Real-Time Large-Scale 3DGS Rendering cs.GR · 2026-05-12 · unverdicted · none · ref 9 · internal anchor
3DGS³ adds gradient-guided super-sampling and lightweight temporal interpolation to low-resolution 3DGS renders to produce high-resolution, high-frame-rate output without retraining the underlying scene representation.
Habitat-GS: A High-Fidelity Navigation Simulator with Dynamic Gaussian Splatting cs.RO · 2026-04-14 · unverdicted · none · ref 4 · internal anchor
Habitat-GS integrates 3D Gaussian Splatting scene rendering and Gaussian avatars into Habitat-Sim, yielding agents with stronger cross-domain generalization and effective human-aware navigation.
Universal Transformers cs.CL · 2018-07-10 · unverdicted · none · ref 5 · internal anchor
Universal Transformers combine Transformer parallelism with recurrent updates and dynamic halting to achieve Turing-completeness under assumptions and outperform standard Transformers on algorithmic and language tasks.

Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer