PSR-NQS makes recurrent neural quantum states scalable for variational Monte Carlo by using parallel scan recurrence, reaching accurate results on 52x52 two-dimensional lattices.
hub
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation
28 Pith papers cite this work. Polarity classification is still indexing.
abstract
In this paper, we propose a novel neural network model called RNN Encoder-Decoder that consists of two recurrent neural networks (RNN). One RNN encodes a sequence of symbols into a fixed-length vector representation, and the other decodes the representation into another sequence of symbols. The encoder and decoder of the proposed model are jointly trained to maximize the conditional probability of a target sequence given a source sequence. The performance of a statistical machine translation system is empirically found to improve by using the conditional probabilities of phrase pairs computed by the RNN Encoder-Decoder as an additional feature in the existing log-linear model. Qualitatively, we show that the proposed model learns a semantically and syntactically meaningful representation of linguistic phrases.
hub tools
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
ZALT learns latent hub states and hub-to-hub dynamics from demonstrations to plan zero-shot solutions for unseen start-goal tasks, achieving 55% success in a 3D maze versus 6% for baselines.
NOVA represents world states as INR weights for decoder-free rendering, compactness, and unsupervised disentanglement of background, foreground, and motion in video world models.
In linear recurrent models, infinite-width signal propagation remains accurate only for depths t much smaller than sqrt(width n), with a critical regime at t ~ c sqrt(n) where finite-width effects emerge and dominate for larger t.
FLUID is a continuous-time transformer using Liquid Attention Networks to model attention as stable ODE solutions that interpolate between discrete SDPA and CT-RNNs, with an explicit sink gate and liquid hyper-connections for better information flow.
Dilated RNN wave functions induce power-law correlations for the critical 1D transverse-field Ising model and the Cluster state, unlike the exponential decay of conventional RNN ansatze.
HealthPoint represents clinical events as points in a 4D space (content, time, modality, case) and applies low-rank relational attention to achieve state-of-the-art mortality prediction from multi-level incomplete multimodal EHRs.
DreamerV3 uses world models and robustness techniques to solve over 150 tasks across domains with a single configuration, including Minecraft diamond collection from scratch.
Graph Attention Networks compute learnable attention coefficients over node neighborhoods to produce weighted feature aggregations, achieving state-of-the-art results on citation networks and inductive protein-protein interaction graphs.
Mixed precision training uses FP16 for most computations, FP32 master weights for accumulation, and loss scaling to enable accurate training of large DNNs with halved memory usage.
3DGS³ adds gradient-guided super-sampling and lightweight temporal interpolation to low-resolution 3DGS renders to produce high-resolution, high-frame-rate output without retraining the underlying scene representation.
Orthogonal unlearning updates plus server-side virtual clients enable effective user data removal in graph federated learning without major performance loss.
A deep kernel learning architecture with transformer feature extraction on clinical-BERT embeddings and Gaussian process backend identifies three glaucoma subgroups by decoupling progression trajectories from current visual acuity in multimodal EHR data.
IDOBE compiles over 10,000 epidemiological outbreaks into a public benchmark and shows that MLP-based models deliver the most robust short-term forecasts while statistical methods hold a slight edge pre-peak.
MATRIX embeds multi-layer watermarks in LLM-generated code via dual-channel constrained parity-check encoding, achieving 99.2% detection accuracy with 0-0.14% functionality loss and 7.7-26.67% better attack robustness than prior methods.
TET-LLM predicts MOOC satisfaction early via temporal event transformers on behavior, LLM embeddings on text, and topic distributions, beating baselines at RMSE 0.82 and AUC 0.77 for 7-day forecasts.
Habitat-GS integrates 3D Gaussian Splatting scene rendering and Gaussian avatars into Habitat-Sim, yielding agents with stronger cross-domain generalization and effective human-aware navigation.
SAM 2 delivers more accurate video segmentation with 3x fewer user interactions and 6x faster image segmentation than the original SAM by training a streaming-memory transformer on the largest video segmentation dataset collected to date.
Geometric deep learning provides a unified mathematical framework based on grids, groups, graphs, geodesics, and gauges to explain and extend neural network architectures by incorporating physical regularities.
CodeBERT pre-trains a bimodal model on code and text pairs plus unimodal data to achieve state-of-the-art results on natural language code search and code documentation generation.
Universal Transformers combine Transformer parallelism with recurrent updates and dynamic halting to achieve Turing-completeness under assumptions and outperform standard Transformers on algorithmic and language tasks.
ConvRec applies hierarchical convolutional layers to generate compact sequence representations for attribute-aware sequential recommendation, achieving linear complexity and outperforming attention-based state-of-the-art models on four real-world datasets.
A self-supervised method learns a fixed set of disentangled fingerprint tokens from medical time series by combining reconstruction loss with a total coding rate diversity penalty, framed as a disentangled rate-distortion problem.
Pith review generated a malformed one-line summary.
citing papers explorer
-
Mastering Diverse Domains through World Models
DreamerV3 uses world models and robustness techniques to solve over 150 tasks across domains with a single configuration, including Minecraft diamond collection from scratch.