pith. sign in

super hub Mixed citations

Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation

Mixed citation behavior. Most common role is background (47%).

105 Pith papers citing it
Background 47% of classified citations
abstract

In this paper, we propose a novel neural network model called RNN Encoder-Decoder that consists of two recurrent neural networks (RNN). One RNN encodes a sequence of symbols into a fixed-length vector representation, and the other decodes the representation into another sequence of symbols. The encoder and decoder of the proposed model are jointly trained to maximize the conditional probability of a target sequence given a source sequence. The performance of a statistical machine translation system is empirically found to improve by using the conditional probabilities of phrase pairs computed by the RNN Encoder-Decoder as an additional feature in the existing log-linear model. Qualitatively, we show that the proposed model learns a semantically and syntactically meaningful representation of linguistic phrases.

hub tools

citation-role summary

background 8 method 5 baseline 1 dataset 1

citation-polarity summary

claims ledger

  • abstract In this paper, we propose a novel neural network model called RNN Encoder-Decoder that consists of two recurrent neural networks (RNN). One RNN encodes a sequence of symbols into a fixed-length vector representation, and the other decodes the representation into another sequence of symbols. The encoder and decoder of the proposed model are jointly trained to maximize the conditional probability of a target sequence given a source sequence. The performance of a statistical machine translation system is empirically found to improve by using the conditional probabilities of phrase pairs computed
  • method To validate whether the quality advantage of GS scenes translates to stronger navigation agents, we train five agent groups under differ- ent scene-domain mixtures, with training budget fixed at5×107 steps:A: 100 mesh scenes,B: 100 GS scenes,C: 80M + 20G,D: 50M + 50G, andE: 20M + 80G. All agents share a unified DD-PPO [30] architecture with a ResNet [6] visual encoder and a GRU [4] policy head, receiving256×256RGB and depth observations, with only training scene composition varying. Each agent i
  • method Recent works [54,62] have demon- strated that incorporating G-buffers and temporal cues can significantly improve reconstruction quality. Motivated by this, we propose a Geometry-Temporal Re- current Refinement Network to further enhance the interpolated outputs. As illustrated in Fig. 3 (a), the proposed network employs a modified gated recurrent unit (GRU) [9] architecture. The network takes as input the current 8 Y. Zhao et al. low-resolution frameIand its gradientsG, the depth mapD, the norm
  • method and we map the stateH t to queries, keys and values with affine projections using learned parameter matrices W Q∈ Rd×d/k, W K∈ Rd×d/k, W V∈ Rd×d/k and W O∈ Rd×d. At step t, the UT then computes revised representationsH t∈ Rm×d for all m input positions as follows H t = LAYERNORM(At+TRANSITION (At)) (4) where At = LAYERNORM((H t−1+P t)+ MULTIHEAD SELFATTENTION (H t−1+P t)), (5) where LAYERNORM () is defined in Ba et al. (2016), and TRANSITION () and P t are discussed below. Depending on the task, w
  • dataset and the kinematics are deterministic; digits overlap without collision, and bouncing reflections are non-linear. ( 2)PhyWorld Collision 30K[Kang et al., 2025] comprises roughly 30 thousand simulated rigid-body collisions with varying ball radii and velocities. It tests OOD generalisation and strict physical consistency (momentum and kinetic energy conservation). (3)WeatherBench (2m temperature)[Rasp et al., 2020] is a processed version of the ERA5 archive [Hersbach et al., 2020], containing glob
  • background Middle: attention-based learning enables adaptive and reliability-aware aggregation of heterogeneous measurements. Right: representative applications, including radio map reconstruction, LEO satellite localization, and map-informed resource allocation. To address these challenges, attention mechanisms have emerged as an effective framework for adaptive information aggregation. Originally developed for neural machine trans- lation [29], [30] and later generalized by Transformers [31], attention c
  • method θm (zt, at)→ˆz t+1) predicts the next latent state after a primitive action, and the decoder (Decθd (ˆzt+1)→ˆs t+1) reconstructs next-state information from the predicted latent. The pur- pose is not to use this model as a long-horizon planner, but to learn a dynamics-aware latent space where behaviorally similar states can merge and useful hubs can emerge. We use a GRU [9] memory module with clipped history so the latent can capture recent context without memorizing full trajectory identity. Ad

authors

co-cited works

representative citing papers

Zero-shot Imitation Learning by Latent Topology Mapping

cs.LG · 2026-05-08 · unverdicted · novelty 7.0

ZALT learns latent hub states and hub-to-hub dynamics from demonstrations to plan zero-shot solutions for unseen start-goal tasks, achieving 55% success in a 3D maze versus 6% for baselines.

Cognitive Alpha Mining via LLM-Driven Code-Based Evolution

cs.CL · 2025-11-24 · unverdicted · novelty 7.0

CogAlpha combines LLM reasoning with code-level evolutionary search to discover financial alphas that show higher predictive accuracy and generalization than prior methods on five stock datasets.

Mastering Diverse Domains through World Models

cs.AI · 2023-01-10 · unverdicted · novelty 7.0

DreamerV3 uses world models and robustness techniques to solve over 150 tasks across domains with a single configuration, including Minecraft diamond collection from scratch.

Human Motion Diffusion Model

cs.CV · 2022-09-29 · unverdicted · novelty 7.0

MDM is a classifier-free diffusion model that generates expressive human motions by predicting clean samples rather than noise, supporting text and action conditioning and outperforming prior methods on standard benchmarks.

Mastering Atari with Discrete World Models

cs.LG · 2020-10-05 · accept · novelty 7.0

DreamerV2 reaches human-level performance on 55 Atari games by learning behaviors inside a separately trained discrete-latent world model.

Brno Mobile OCR Dataset

cs.CV · 2019-07-02 · accept · novelty 7.0

Introduces B-MOD dataset of 19,728 mobile device photos of documents with precise text line annotations and a neural baseline showing high error rates on harder images.

Graph Attention Networks

stat.ML · 2017-10-30 · accept · novelty 7.0

Graph Attention Networks compute learnable attention coefficients over node neighborhoods to produce weighted feature aggregations, achieving state-of-the-art results on citation networks and inductive protein-protein interaction graphs.

Mixed Precision Training

cs.AI · 2017-10-10 · accept · novelty 7.0

Mixed precision training uses FP16 for most computations, FP32 master weights for accumulation, and loss scaling to enable accurate training of large DNNs with halved memory usage.

Learning Cardiac Latent Representations in Vectorcardiogram Space

cs.LG · 2026-05-29 · unverdicted · novelty 6.0

LVCG is the first self-supervised framework for learning view-invariant latent VCG representations that claims to outperform ECG-space baselines with better robustness and generalization in domain shift settings.

Generative Recursive Reasoning

cs.AI · 2026-05-19 · unverdicted · novelty 6.0 · 2 refs

GRAM is a latent-variable generative model that performs recursive reasoning via stochastic trajectories, trained with amortized variational inference to support multi-hypothesis reasoning and unconditional generation.

citing papers explorer

Showing 50 of 105 citing papers.