ZALT learns latent hub states and hub-to-hub dynamics from demonstrations to plan zero-shot solutions for unseen start-goal tasks, achieving 55% success in a 3D maze versus 6% for baselines.
citation dossier
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation
In this paper, we propose a novel neural network model called RNN Encoder-Decoder that consists of two recurrent neural networks (RNN). One RNN encodes a sequence of symbols into a fixed-length vector representation, and the other decodes the representation into another sequence of symbols. The encoder and decoder of the proposed model are jointly trained to maximize the conditional probability of a target sequence given a source sequence. The performance of a statistical machine translation system is empirically found to improve by using the conditional probabilities of phrase pairs computed by the RNN Encoder-Decoder as an additional feature in the existing log-linear model. Qualitatively, we show that the proposed model learns a semantically and syntactically meaningful representation of linguistic phrases.
why this work matters in Pith
Pith has found this work in 27 reviewed papers. Its strongest current cluster is cs.LG (10 papers). The largest review-status bucket among citing papers is UNVERDICTED (21 papers). For highly cited works, this page shows a dossier first and a bounded explorer second; it never tries to render every citing paper at once.
representative citing papers
NOVA represents world states as INR weights for decoder-free rendering, compactness, and unsupervised disentanglement of background, foreground, and motion in video world models.
In linear recurrent models, infinite-width signal propagation remains accurate only for depths t much smaller than sqrt(width n), with a critical regime at t ~ c sqrt(n) where finite-width effects emerge and dominate for larger t.
FLUID is a continuous-time transformer using Liquid Attention Networks to model attention as stable ODE solutions that interpolate between discrete SDPA and CT-RNNs, with an explicit sink gate and liquid hyper-connections for better information flow.
Dilated RNN wave functions induce power-law correlations for the critical 1D transverse-field Ising model and the Cluster state, unlike the exponential decay of conventional RNN ansatze.
HealthPoint represents clinical events as points in a 4D space (content, time, modality, case) and applies low-rank relational attention to achieve state-of-the-art mortality prediction from multi-level incomplete multimodal EHRs.
DreamerV3 uses world models and robustness techniques to solve over 150 tasks across domains with a single configuration, including Minecraft diamond collection from scratch.
Graph Attention Networks compute learnable attention coefficients over node neighborhoods to produce weighted feature aggregations, achieving state-of-the-art results on citation networks and inductive protein-protein interaction graphs.
Mixed precision training uses FP16 for most computations, FP32 master weights for accumulation, and loss scaling to enable accurate training of large DNNs with halved memory usage.
3DGS³ adds gradient-guided super-sampling and lightweight temporal interpolation to low-resolution 3DGS renders to produce high-resolution, high-frame-rate output without retraining the underlying scene representation.
Orthogonal unlearning updates plus server-side virtual clients enable effective user data removal in graph federated learning without major performance loss.
A deep kernel learning architecture with transformer feature extraction on clinical-BERT embeddings and Gaussian process backend identifies three glaucoma subgroups by decoupling progression trajectories from current visual acuity in multimodal EHR data.
IDOBE compiles over 10,000 epidemiological outbreaks into a public benchmark and shows that MLP-based models deliver the most robust short-term forecasts while statistical methods hold a slight edge pre-peak.
MATRIX embeds multi-layer watermarks in LLM-generated code via dual-channel constrained parity-check encoding, achieving 99.2% detection accuracy with 0-0.14% functionality loss and 7.7-26.67% better attack robustness than prior methods.
TET-LLM predicts MOOC satisfaction early via temporal event transformers on behavior, LLM embeddings on text, and topic distributions, beating baselines at RMSE 0.82 and AUC 0.77 for 7-day forecasts.
Habitat-GS integrates 3D Gaussian Splatting scene rendering and Gaussian avatars into Habitat-Sim, yielding agents with stronger cross-domain generalization and effective human-aware navigation.
SAM 2 delivers more accurate video segmentation with 3x fewer user interactions and 6x faster image segmentation than the original SAM by training a streaming-memory transformer on the largest video segmentation dataset collected to date.
Geometric deep learning provides a unified mathematical framework based on grids, groups, graphs, geodesics, and gauges to explain and extend neural network architectures by incorporating physical regularities.
CodeBERT pre-trains a bimodal model on code and text pairs plus unimodal data to achieve state-of-the-art results on natural language code search and code documentation generation.
Universal Transformers combine Transformer parallelism with recurrent updates and dynamic halting to achieve Turing-completeness under assumptions and outperform standard Transformers on algorithmic and language tasks.
ConvRec applies hierarchical convolutional layers to generate compact sequence representations for attribute-aware sequential recommendation, achieving linear complexity and outperforming attention-based state-of-the-art models on four real-world datasets.
A self-supervised method learns a fixed set of disentangled fingerprint tokens from medical time series by combining reconstruction loss with a total coding rate diversity penalty, framed as a disentangled rate-distortion problem.
Pith review generated a malformed one-line summary.
Leaky RNNs improve grid-cell-like representations and path-integration accuracy by acting as a low-pass filter that stabilizes dynamics against noise.
citing papers explorer
-
Zero-shot Imitation Learning by Latent Topology Mapping
ZALT learns latent hub states and hub-to-hub dynamics from demonstrations to plan zero-shot solutions for unseen start-goal tasks, achieving 55% success in a 3D maze versus 6% for baselines.
-
Render, Don't Decode: Weight-Space World Models with Latent Structural Disentanglement
NOVA represents world states as INR weights for decoder-free rendering, compactness, and unsupervised disentanglement of background, foreground, and motion in video world models.
-
How Long Does Infinite Width Last? Signal Propagation in Long-Range Linear Recurrences
In linear recurrent models, infinite-width signal propagation remains accurate only for depths t much smaller than sqrt(width n), with a critical regime at t ~ c sqrt(n) where finite-width effects emerge and dominate for larger t.
-
FLUID: Continuous-Time Hyperconnected Sparse Transformer for Sink-Free Learning
FLUID is a continuous-time transformer using Liquid Attention Networks to model attention as stable ODE solutions that interpolate between discrete SDPA and CT-RNNs, with an explicit sink gate and liquid hyper-connections for better information flow.
-
Geometry-Induced Long-Range Correlations in Recurrent Neural Network Quantum States
Dilated RNN wave functions induce power-law correlations for the critical 1D transverse-field Ising model and the Cluster state, unlike the exponential decay of conventional RNN ansatze.
-
A Clinical Point Cloud Paradigm for In-Hospital Mortality Prediction from Multi-Level Incomplete Multimodal EHRs
HealthPoint represents clinical events as points in a 4D space (content, time, modality, case) and applies low-rank relational attention to achieve state-of-the-art mortality prediction from multi-level incomplete multimodal EHRs.
-
Mastering Diverse Domains through World Models
DreamerV3 uses world models and robustness techniques to solve over 150 tasks across domains with a single configuration, including Minecraft diamond collection from scratch.
-
Graph Attention Networks
Graph Attention Networks compute learnable attention coefficients over node neighborhoods to produce weighted feature aggregations, achieving state-of-the-art results on citation networks and inductive protein-protein interaction graphs.
-
Mixed Precision Training
Mixed precision training uses FP16 for most computations, FP32 master weights for accumulation, and loss scaling to enable accurate training of large DNNs with halved memory usage.
-
3DGS$^3$: Joint Super Sampling and Frame Interpolation for Real-Time Large-Scale 3DGS Rendering
3DGS³ adds gradient-guided super-sampling and lightweight temporal interpolation to low-resolution 3DGS renders to produce high-resolution, high-frame-rate output without retraining the underlying scene representation.
-
Graph Federated Unlearning for Privacy Preservation
Orthogonal unlearning updates plus server-side virtual clients enable effective user data removal in graph federated learning without major performance loss.
-
Deep Kernel Learning for Stratifying Glaucoma Trajectories
A deep kernel learning architecture with transformer feature extraction on clinical-BERT embeddings and Gaussian process backend identifies three glaucoma subgroups by decoupling progression trajectories from current visual acuity in multimodal EHR data.
-
IDOBE: Infectious Disease Outbreak forecasting Benchmark Ecosystem
IDOBE compiles over 10,000 epidemiological outbreaks into a public benchmark and shows that MLP-based models deliver the most robust short-term forecasts while statistical methods hold a slight edge pre-peak.
-
MATRIX: Multi-Layer Code Watermarking via Dual-Channel Constrained Parity-Check Encoding
MATRIX embeds multi-layer watermarks in LLM-generated code via dual-channel constrained parity-check encoding, achieving 99.2% detection accuracy with 0-0.14% functionality loss and 7.7-26.67% better attack robustness than prior methods.
-
Early-Warning Learner Satisfaction Forecasting in MOOCs via Temporal Event Transformers and LLM Text Embeddings
TET-LLM predicts MOOC satisfaction early via temporal event transformers on behavior, LLM embeddings on text, and topic distributions, beating baselines at RMSE 0.82 and AUC 0.77 for 7-day forecasts.
-
Habitat-GS: A High-Fidelity Navigation Simulator with Dynamic Gaussian Splatting
Habitat-GS integrates 3D Gaussian Splatting scene rendering and Gaussian avatars into Habitat-Sim, yielding agents with stronger cross-domain generalization and effective human-aware navigation.
-
SAM 2: Segment Anything in Images and Videos
SAM 2 delivers more accurate video segmentation with 3x fewer user interactions and 6x faster image segmentation than the original SAM by training a streaming-memory transformer on the largest video segmentation dataset collected to date.
-
Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges
Geometric deep learning provides a unified mathematical framework based on grids, groups, graphs, geodesics, and gauges to explain and extend neural network architectures by incorporating physical regularities.
-
CodeBERT: A Pre-Trained Model for Programming and Natural Languages
CodeBERT pre-trains a bimodal model on code and text pairs plus unimodal data to achieve state-of-the-art results on natural language code search and code documentation generation.
-
Universal Transformers
Universal Transformers combine Transformer parallelism with recurrent updates and dynamic halting to achieve Turing-completeness under assumptions and outperform standard Transformers on algorithmic and language tasks.
-
Rethinking Convolutional Networks for Attribute-Aware Sequential Recommendation
ConvRec applies hierarchical convolutional layers to generate compact sequence representations for attribute-aware sequential recommendation, achieving linear complexity and outperforming attention-based state-of-the-art models on four real-world datasets.
-
Learning Fingerprints for Medical Time Series with Redundancy-Constrained Information Maximization
A self-supervised method learns a fixed set of disentangled fingerprint tokens from medical time series by combining reconstruction loss with a total coding rate diversity penalty, framed as a disentangled rate-distortion problem.
-
Attention Is All You Need
Pith review generated a malformed one-line summary.
-
Impact of leaky dynamics on predictive path integration accuracy in recurrent neural networks
Leaky RNNs improve grid-cell-like representations and path-integration accuracy by acting as a low-pass filter that stabilizes dynamics against noise.
-
Beyond Isolated Clients: Integrating Graph-Based Embeddings into Event Sequence Models
Three strategies for adding graph embeddings to event sequence SSL models improve AUC by up to 2.3% on four financial and e-commerce datasets, with graph density determining the best integration approach.
-
Learning-Based Spectrum Cartography in Low Earth Orbit Satellite Networks: An Overview
The paper overviews attention-based learning methods for spectrum cartography in LEO satellite networks to enable adaptive fusion of heterogeneous measurements for inference and resource allocation.
-
Benchmarking PyCaret AutoML Against BiLSTM for Fine-Grained Emotion Classification: A Comparative Study on 20-Class Emotion Detection
BiLSTM achieves 89% accuracy and 0.89 weighted F1 on 20-class emotion detection, marginally outperforming SVM at 88.11% on a 79,595-sentence dataset.