RLA-WM predicts residual latent actions via flow matching to create visual feature world models that outperform prior feature-based and diffusion approaches while enabling offline video-based robot RL.
Title resolution pending
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 4roles
background 1polarities
background 1representative citing papers
Relaxing join orders to a differentiable soft adjacency matrix and optimizing with gradients plus a GNN cost model yields plans that match or beat discrete search while scaling better on graph datasets.
Dream-MPC refines policy-generated trajectories by gradient ascent in a latent world model with uncertainty regularization and temporal amortization, improving base policy performance and beating gradient-free MPC on 24 continuous control tasks.
Facial emotion embeddings improve short-term pose forecasting accuracy for emotion-driven motions when fused via normalized gating in a lightweight LSTM world model, but not with simple multimodal fusion.
citing papers explorer
-
Learning Visual Feature-Based World Models via Residual Latent Action
RLA-WM predicts residual latent actions via flow matching to create visual feature world models that outperform prior feature-based and diffusion approaches while enabling offline video-based robot RL.
-
Gradient-Based Join Ordering
Relaxing join orders to a differentiable soft adjacency matrix and optimizing with gradients plus a GNN cost model yields plans that match or beat discrete search while scaling better on graph datasets.
-
Dream-MPC: Gradient-Based Model Predictive Control with Latent Imagination
Dream-MPC refines policy-generated trajectories by gradient ascent in a latent world model with uncertainty regularization and temporal amortization, improving base policy performance and beating gradient-free MPC on 24 continuous control tasks.
-
Emotion-Conditioned Short-Horizon Human Pose Forecasting with a Lightweight Predictive World Model
Facial emotion embeddings improve short-term pose forecasting accuracy for emotion-driven motions when fused via normalized gating in a lightweight LSTM world model, but not with simple multimodal fusion.