PathVQA is the first public dataset of over 32,000 questions on nearly 5,000 pathology images for medical visual question answering.
super hub Mixed citations
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation
Mixed citation behavior. Most common role is background (47%).
abstract
In this paper, we propose a novel neural network model called RNN Encoder-Decoder that consists of two recurrent neural networks (RNN). One RNN encodes a sequence of symbols into a fixed-length vector representation, and the other decodes the representation into another sequence of symbols. The encoder and decoder of the proposed model are jointly trained to maximize the conditional probability of a target sequence given a source sequence. The performance of a statistical machine translation system is empirically found to improve by using the conditional probabilities of phrase pairs computed by the RNN Encoder-Decoder as an additional feature in the existing log-linear model. Qualitatively, we show that the proposed model learns a semantically and syntactically meaningful representation of linguistic phrases.
hub tools
citation-role summary
citation-polarity summary
claims ledger
- abstract In this paper, we propose a novel neural network model called RNN Encoder-Decoder that consists of two recurrent neural networks (RNN). One RNN encodes a sequence of symbols into a fixed-length vector representation, and the other decodes the representation into another sequence of symbols. The encoder and decoder of the proposed model are jointly trained to maximize the conditional probability of a target sequence given a source sequence. The performance of a statistical machine translation system is empirically found to improve by using the conditional probabilities of phrase pairs computed
- method To validate whether the quality advantage of GS scenes translates to stronger navigation agents, we train five agent groups under differ- ent scene-domain mixtures, with training budget fixed at5×107 steps:A: 100 mesh scenes,B: 100 GS scenes,C: 80M + 20G,D: 50M + 50G, andE: 20M + 80G. All agents share a unified DD-PPO [30] architecture with a ResNet [6] visual encoder and a GRU [4] policy head, receiving256×256RGB and depth observations, with only training scene composition varying. Each agent i
- method Recent works [54,62] have demon- strated that incorporating G-buffers and temporal cues can significantly improve reconstruction quality. Motivated by this, we propose a Geometry-Temporal Re- current Refinement Network to further enhance the interpolated outputs. As illustrated in Fig. 3 (a), the proposed network employs a modified gated recurrent unit (GRU) [9] architecture. The network takes as input the current 8 Y. Zhao et al. low-resolution frameIand its gradientsG, the depth mapD, the norm
- method and we map the stateH t to queries, keys and values with affine projections using learned parameter matrices W Q∈ Rd×d/k, W K∈ Rd×d/k, W V∈ Rd×d/k and W O∈ Rd×d. At step t, the UT then computes revised representationsH t∈ Rm×d for all m input positions as follows H t = LAYERNORM(At+TRANSITION (At)) (4) where At = LAYERNORM((H t−1+P t)+ MULTIHEAD SELFATTENTION (H t−1+P t)), (5) where LAYERNORM () is defined in Ba et al. (2016), and TRANSITION () and P t are discussed below. Depending on the task, w
- dataset and the kinematics are deterministic; digits overlap without collision, and bouncing reflections are non-linear. ( 2)PhyWorld Collision 30K[Kang et al., 2025] comprises roughly 30 thousand simulated rigid-body collisions with varying ball radii and velocities. It tests OOD generalisation and strict physical consistency (momentum and kinetic energy conservation). (3)WeatherBench (2m temperature)[Rasp et al., 2020] is a processed version of the ERA5 archive [Hersbach et al., 2020], containing glob
- background Middle: attention-based learning enables adaptive and reliability-aware aggregation of heterogeneous measurements. Right: representative applications, including radio map reconstruction, LEO satellite localization, and map-informed resource allocation. To address these challenges, attention mechanisms have emerged as an effective framework for adaptive information aggregation. Originally developed for neural machine trans- lation [29], [30] and later generalized by Transformers [31], attention c
- method θm (zt, at)→ˆz t+1) predicts the next latent state after a primitive action, and the decoder (Decθd (ˆzt+1)→ˆs t+1) reconstructs next-state information from the predicted latent. The pur- pose is not to use this model as a long-horizon planner, but to learn a dynamics-aware latent space where behaviorally similar states can merge and useful hubs can emerge. We use a GRU [9] memory module with clipped history so the latent can capture recent context without memorizing full trajectory identity. Ad
authors
co-cited works
representative citing papers
CPF-GCD enforces low-rank compositional structure on vision backbone features via spatial primitive fields so that novel categories emerge as new activation patterns over a shared vocabulary of reusable visual primitives.
A unified algebraic account reduces RNN expressivity to syntactic monoid division in wreath products and shows diagonal state-space models realize every even-modulus counter under unsigned-integer quantization but none under floating-point recurrences.
NDR-SHKF replaces the static forgetting factor in Sage-Husa Kalman Filters with a learned vector-valued memory attenuation policy from a bifurcated recurrent network trained end-to-end on whitened innovations to minimize estimation error.
Nested-GPT is an autoregressive Transformer surrogate that generates variable-multiplicity parton showers while enforcing ordered Markovian branching and matches reference Monte Carlo results for leading-log non-global logarithm resummation in the large-Nc limit.
PluRule is a new multimodal multilingual benchmark showing that state-of-the-art vision-language models perform only marginally better than a trivial baseline at detecting specific rule violations in pluralistic online communities.
PSR-NQS makes recurrent neural quantum states scalable for variational Monte Carlo by using parallel scan recurrence, reaching accurate results on 52x52 two-dimensional lattices.
ZALT learns latent hub states and hub-to-hub dynamics from demonstrations to plan zero-shot solutions for unseen start-goal tasks, achieving 55% success in a 3D maze versus 6% for baselines.
NOVA represents world states as INR weights for decoder-free rendering, compactness, and unsupervised disentanglement of background, foreground, and motion in video world models.
In linear recurrent models, infinite-width signal propagation remains accurate only for depths t much smaller than sqrt(width n), with a critical regime at t ~ c sqrt(n) where finite-width effects emerge and dominate for larger t.
FLUID is a continuous-time transformer using Liquid Attention Networks to model attention as stable ODE solutions that interpolate between discrete SDPA and CT-RNNs, with an explicit sink gate and liquid hyper-connections for better information flow.
Dilated RNN wave functions induce power-law correlations for the critical 1D transverse-field Ising model and the Cluster state, unlike the exponential decay of conventional RNN ansatze.
HealthPoint represents clinical events as points in a 4D space (content, time, modality, case) and applies low-rank relational attention to achieve state-of-the-art mortality prediction from multi-level incomplete multimodal EHRs.
Denoising particle filters train state estimators on individual transitions via score matching, then use the learned denoiser with a dynamics model to approximate Bayesian filtering step-by-step, matching end-to-end baselines while preserving composability.
MELT is the first behavioral trace dataset for high-risk memecoin launch detection on Solana, providing 122 features, risk annotations, and ML benchmarks that reduce investment loss when used for selection.
CogAlpha combines LLM reasoning with code-level evolutionary search to discover financial alphas that show higher predictive accuracy and generalization than prior methods on five stock datasets.
DreamerV3 uses world models and robustness techniques to solve over 150 tasks across domains with a single configuration, including Minecraft diamond collection from scratch.
MDM is a classifier-free diffusion model that generates expressive human motions by predicting clean samples rather than noise, supporting text and action conditioning and outperforming prior methods on standard benchmarks.
DreamerV2 reaches human-level performance on 55 Atari games by learning behaviors inside a separately trained discrete-latent world model.
Introduces B-MOD dataset of 19,728 mobile device photos of documents with precise text line annotations and a neural baseline showing high error rates on harder images.
Graph Attention Networks compute learnable attention coefficients over node neighborhoods to produce weighted feature aggregations, achieving state-of-the-art results on citation networks and inductive protein-protein interaction graphs.
Mixed precision training uses FP16 for most computations, FP32 master weights for accumulation, and loss scaling to enable accurate training of large DNNs with halved memory usage.
Warrant adds a query-item permission gate g_ij to attention value terms, improving primary metrics in 27 of 32 comparisons across CTDG, MTPP, RAG, STPP, and TKG tasks.
MoCo-AIS is a MoCo-based contrastive learning framework that learns vessel trajectory embeddings and improves similarity computation over baselines on large-scale real-world AIS datasets while offering a benchmarking platform.
citing papers explorer
-
Parallel Scan Recurrent Neural Quantum States for Scalable Variational Monte Carlo
PSR-NQS makes recurrent neural quantum states scalable for variational Monte Carlo by using parallel scan recurrence, reaching accurate results on 52x52 two-dimensional lattices.
-
Zero-shot Imitation Learning by Latent Topology Mapping
ZALT learns latent hub states and hub-to-hub dynamics from demonstrations to plan zero-shot solutions for unseen start-goal tasks, achieving 55% success in a 3D maze versus 6% for baselines.
-
3DGS$^3$: Joint Super Sampling and Frame Interpolation for Real-Time Large-Scale 3DGS Rendering
3DGS³ adds gradient-guided super-sampling and lightweight temporal interpolation to low-resolution 3DGS renders to produce high-resolution, high-frame-rate output without retraining the underlying scene representation.
-
Habitat-GS: A High-Fidelity Navigation Simulator with Dynamic Gaussian Splatting
Habitat-GS integrates 3D Gaussian Splatting scene rendering and Gaussian avatars into Habitat-Sim, yielding agents with stronger cross-domain generalization and effective human-aware navigation.
-
Universal Transformers
Universal Transformers combine Transformer parallelism with recurrent updates and dynamic halting to achieve Turing-completeness under assumptions and outperform standard Transformers on algorithmic and language tasks.