hub Mixed citations

LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels

Lucas Maes, Quentin Le Lidec, Damien Scieur, Yann LeCun, Randall Balestriero · 2026 · cs.LG · arXiv 2603.19312

Mixed citation behavior. Most common role is background (44%).

54 Pith papers citing it

Background 44% of classified citations

open full Pith review browse 54 citing papers arXiv PDF

abstract

Joint Embedding Predictive Architectures (JEPAs) offer a compelling framework for learning world models in compact latent spaces, yet existing methods remain fragile, relying on complex multi-term losses, exponential moving averages, pre-trained encoders, or auxiliary supervision to avoid representation collapse. In this work, we introduce LeWorldModel (LeWM), the first JEPA that trains stably end-to-end from raw pixels using only two loss terms: a next-embedding prediction loss and a regularizer enforcing Gaussian-distributed latent embeddings. This reduces tunable loss hyperparameters from six to one compared to the only existing end-to-end alternative. With ~15M parameters trainable on a single GPU in a few hours, LeWM plans up to 48x faster than foundation-model-based world models while remaining competitive across diverse 2D and 3D control tasks. Beyond control, we show that LeWM's latent space encodes meaningful physical structure through probing of physical quantities. Surprise evaluation confirms that the model reliably detects physically implausible events.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 6 method 3

citation-polarity summary

background 4 use method 3 unclear 2

representative citing papers

When Does LeJEPA Learn a World Model?

stat.ML · 2026-05-25 · unverdicted · novelty 8.0

LeJEPA achieves linear identifiability of latent variables uniquely when the latents are Gaussian in worlds with stationary additive-noise transitions.

LeVLJEPA: End-to-End Vision-Language Pretraining Without Negatives

cs.CV · 2026-07-01 · unverdicted · novelty 7.0

LeVLJEPA is the first non-contrastive vision-language pretraining method that learns via cross-modal prediction without negatives, producing stronger dense features than contrastive baselines on VQA and segmentation tasks.

Targeting World Models to Compromise Robot Learning Pipelines

cs.RO · 2026-06-08 · unverdicted · novelty 7.0

World models introduce a stealthy poisoning vector into robot learning pipelines where malicious prompts or dynamics in teleoperated data activate only during synthetic trajectory generation, enabling backdoors in downstream policies.

X-Tokenizer: A Multimodal Action Tokenizer for Vision-Language-Action Pretraining

cs.CV · 2026-06-07 · unverdicted · novelty 7.0

X-Tokenizer creates semantic action tokens via asymmetric residual quantization and contrastive pretraining on large trajectory data, outperforming prior methods like FAST on robotic tasks.

Contrast encodes inductive bias: separating slow noise from dynamics in predictive representation learning

cs.LG · 2026-06-05 · conditional · novelty 7.0

Cross-trajectory negative sampling in contrastive predictive objectives causes encoding of slow noise over dynamics; intra-trajectory sampling eliminates the shortcut and recovers dynamical variables even under strong noise.

Learn from your own latents and not from tokens: A sample-complexity theory

cs.LG · 2026-05-26 · unverdicted · novelty 7.0

Latent prediction SSL recovers latent trees from PCFG data with sample complexity constant in hierarchy depth L (up to logs), unlike exponential for token-level or supervised methods.

JEDI: Joint Embedding Diffusion World Model for Online Model-Based Reinforcement Learning

cs.LG · 2026-05-13 · unverdicted · novelty 7.0

JEDI is the first online end-to-end latent diffusion world model that trains latents from denoising loss rather than reconstruction, achieving competitive Atari100k results with 43% less VRAM and over 3x faster sampling than pixel diffusion baselines.

ProteinJEPA: Latent prediction complements protein language models

cs.LG · 2026-05-08 · unverdicted · novelty 7.0

Masked-position MLM plus JEPA latent prediction outperforms MLM-only pretraining on 10-11 of 16 downstream tasks for 35M-150M protein models while JEPA alone fails.

AGWM: Affordance-Grounded World Models for Environments with Compositional Prerequisites

cs.AI · 2026-05-07 · unverdicted · novelty 7.0

AGWM improves world model accuracy in compositional environments by learning an explicit DAG of action affordance prerequisites to handle dynamic executability.

Render, Don't Decode: Weight-Space World Models with Latent Structural Disentanglement

cs.CV · 2026-05-07 · unverdicted · novelty 7.0 · 2 refs

NOVA represents world states as INR weights for decoder-free rendering, compactness, and unsupervised disentanglement of background, foreground, and motion in video world models.

Learning to Theorize the World from Observation

cs.LG · 2026-05-05 · unverdicted · novelty 7.0

NEO is a probabilistic neural model that induces compositional programs as a learned Language of Thought from non-textual observations and executes them via a shared transition model to enable explanation-driven generalization.

Latent State Design for World Models under Sufficiency Constraints

cs.AI · 2026-05-03 · unverdicted · novelty 7.0

World models succeed when their latent states are built to meet task-specific sufficiency constraints rather than preserving the maximum amount of information.

3D-Anchored Lookahead Planning for Persistent Robotic Scene Memory via World-Model-Based MCTS

cs.RO · 2026-04-13 · unverdicted · novelty 7.0

3D-ALP achieves 0.65 success on memory-dependent 5-step robotic reach tasks versus near-zero for reactive baselines by anchoring MCTS planning to a persistent 3D camera-to-world frame.

ACID: Action Consistency via Inverse Dynamics for Planning with World Models

cs.RO · 2026-07-02 · unverdicted · novelty 6.0

ACID improves decision-time planning in world models by adding per-step action consistency residuals from an inverse dynamics model to the planning cost via an adaptive weight, yielding better performance with less compute across manipulation and navigation tasks.

UniTacVLA: Unified Tactile Understanding and Prediction in Vision Language Action Models

cs.RO · 2026-06-30 · unverdicted · novelty 6.0

UniTacVLA builds a state-aware and dynamics-aware tactile prior via unified latent space, tactile chain-of-thought, and mixed real/predicted feedback controller to boost dexterous manipulation performance.

Delta-JEPA: Learning Action-Sensitive World Models via Latent Difference Decoding

cs.AI · 2026-06-30 · unverdicted · novelty 6.0

Delta-JEPA augments latent forward prediction with a Latent Difference Action Decoder that reconstructs actions from embedding displacements, yielding action-sensitive world models that improve planning on four visual continuous-control tasks over JEPA baselines.

ScaleAware-JEPA: Latent Representation for Discovery in Multiscale Physical Fields

cs.LG · 2026-06-29 · unverdicted · novelty 6.0

ScaleAware-JEPA combines Constrained Diffusion Decomposition with a scale-tied JEPA objective to learn label-free latent coordinates that recover coherent morphology in multiscale fields such as MHD turbulence and interstellar gas.

Flow Matching in Feature Space for Stochastic World Modeling

cs.CV · 2026-06-27 · unverdicted · novelty 6.0

FlowWM applies flow matching directly in pretrained feature space with a one-step projection mechanism, improving perception accuracy, mode coverage, and horizon robustness on synthetic and real-world benchmarks.

TacForeSight: Force-Guided Tactile World Model for Contact-Rich Manipulation

cs.RO · 2026-06-09 · unverdicted · novelty 6.0

TacForeSight trains a force-conditioned tactile world model to predict latent dynamics and uses those predictions as anticipatory priors inside a visuo-tactile policy for real-time contact-rich manipulation.

$\omega$-EVA: Envision, Verify, and Act with Latent Interactive World Models

cs.RO · 2026-06-08 · unverdicted · novelty 6.0

ω-EVA is a three-stage latent world model framework that trains action-conditioned dynamics, a language-conditioned flow policy, and a tri-branch refiner to improve embodied action generation in simulation.

FF-JEPA: Long-Horizon Planning in World Models with Latent Planners

cs.AI · 2026-06-08 · unverdicted · novelty 6.0

FF-JEPA introduces a two-model hierarchical structure with an action-free latent planner to decompose long-horizon planning into short subgoals in latent world models.

Dexterity-BEV: Aligning 3D World and Actions for Generalizable Robot Policies Learning

cs.RO · 2026-06-01 · unverdicted · novelty 6.0

Dexterity-BEV creates 3D vertex-based inputs and BEV-aligned outputs to reduce spatial-temporal misalignments in end-to-end robot policies trained on diverse datasets and embodiments.

Beyond Task Success: Behavioral and Representational Diagnostics for WAM and VLA

cs.RO · 2026-05-31 · unverdicted · novelty 6.0

Empirical study introduces behavioral and representational diagnostics showing architecture-dependent gains in object targeting and predictive structure for WAMs over VLAs on LIBERO and RoboTwin2.0.

StressDream: Steering Video World Models for Robust Policy Evaluation and Improvement

cs.CV · 2026-05-29 · unverdicted · novelty 6.0

StressDream optimizes initial noise in diffusion video world models using VLM semantic and plausibility objectives to steer generations toward specified high-impact outcomes for improved policy evaluation.

citing papers explorer

Showing 50 of 54 citing papers after filters.

When Does LeJEPA Learn a World Model? stat.ML · 2026-05-25 · unverdicted · none · ref 9 · internal anchor
LeJEPA achieves linear identifiability of latent variables uniquely when the latents are Gaussian in worlds with stationary additive-noise transitions.
LeVLJEPA: End-to-End Vision-Language Pretraining Without Negatives cs.CV · 2026-07-01 · unverdicted · none · ref 21 · internal anchor
LeVLJEPA is the first non-contrastive vision-language pretraining method that learns via cross-modal prediction without negatives, producing stronger dense features than contrastive baselines on VQA and segmentation tasks.
Targeting World Models to Compromise Robot Learning Pipelines cs.RO · 2026-06-08 · unverdicted · none · ref 18 · internal anchor
World models introduce a stealthy poisoning vector into robot learning pipelines where malicious prompts or dynamics in teleoperated data activate only during synthetic trajectory generation, enabling backdoors in downstream policies.
X-Tokenizer: A Multimodal Action Tokenizer for Vision-Language-Action Pretraining cs.CV · 2026-06-07 · unverdicted · none · ref 9 · internal anchor
X-Tokenizer creates semantic action tokens via asymmetric residual quantization and contrastive pretraining on large trajectory data, outperforming prior methods like FAST on robotic tasks.
Contrast encodes inductive bias: separating slow noise from dynamics in predictive representation learning cs.LG · 2026-06-05 · conditional · none · ref 18 · internal anchor
Cross-trajectory negative sampling in contrastive predictive objectives causes encoding of slow noise over dynamics; intra-trajectory sampling eliminates the shortcut and recovers dynamical variables even under strong noise.
Learn from your own latents and not from tokens: A sample-complexity theory cs.LG · 2026-05-26 · unverdicted · none · ref 52 · internal anchor
Latent prediction SSL recovers latent trees from PCFG data with sample complexity constant in hierarchy depth L (up to logs), unlike exponential for token-level or supervised methods.
JEDI: Joint Embedding Diffusion World Model for Online Model-Based Reinforcement Learning cs.LG · 2026-05-13 · unverdicted · none · ref 23 · internal anchor
JEDI is the first online end-to-end latent diffusion world model that trains latents from denoising loss rather than reconstruction, achieving competitive Atari100k results with 43% less VRAM and over 3x faster sampling than pixel diffusion baselines.
ProteinJEPA: Latent prediction complements protein language models cs.LG · 2026-05-08 · unverdicted · none · ref 10 · internal anchor
Masked-position MLM plus JEPA latent prediction outperforms MLM-only pretraining on 10-11 of 16 downstream tasks for 35M-150M protein models while JEPA alone fails.
AGWM: Affordance-Grounded World Models for Environments with Compositional Prerequisites cs.AI · 2026-05-07 · unverdicted · none · ref 64 · internal anchor
AGWM improves world model accuracy in compositional environments by learning an explicit DAG of action affordance prerequisites to handle dynamic executability.
Render, Don't Decode: Weight-Space World Models with Latent Structural Disentanglement cs.CV · 2026-05-07 · unverdicted · none · ref 16 · 2 links · internal anchor
NOVA represents world states as INR weights for decoder-free rendering, compactness, and unsupervised disentanglement of background, foreground, and motion in video world models.
Learning to Theorize the World from Observation cs.LG · 2026-05-05 · unverdicted · none · ref 2 · internal anchor
NEO is a probabilistic neural model that induces compositional programs as a learned Language of Thought from non-textual observations and executes them via a shared transition model to enable explanation-driven generalization.
Latent State Design for World Models under Sufficiency Constraints cs.AI · 2026-05-03 · unverdicted · none · ref 44 · internal anchor
World models succeed when their latent states are built to meet task-specific sufficiency constraints rather than preserving the maximum amount of information.
3D-Anchored Lookahead Planning for Persistent Robotic Scene Memory via World-Model-Based MCTS cs.RO · 2026-04-13 · unverdicted · none · ref 13 · internal anchor
3D-ALP achieves 0.65 success on memory-dependent 5-step robotic reach tasks versus near-zero for reactive baselines by anchoring MCTS planning to a persistent 3D camera-to-world frame.
ACID: Action Consistency via Inverse Dynamics for Planning with World Models cs.RO · 2026-07-02 · unverdicted · none · ref 5 · internal anchor
ACID improves decision-time planning in world models by adding per-step action consistency residuals from an inverse dynamics model to the planning cost via an adaptive weight, yielding better performance with less compute across manipulation and navigation tasks.
UniTacVLA: Unified Tactile Understanding and Prediction in Vision Language Action Models cs.RO · 2026-06-30 · unverdicted · none · ref 38 · internal anchor
UniTacVLA builds a state-aware and dynamics-aware tactile prior via unified latent space, tactile chain-of-thought, and mixed real/predicted feedback controller to boost dexterous manipulation performance.
Delta-JEPA: Learning Action-Sensitive World Models via Latent Difference Decoding cs.AI · 2026-06-30 · unverdicted · none · ref 9 · internal anchor
Delta-JEPA augments latent forward prediction with a Latent Difference Action Decoder that reconstructs actions from embedding displacements, yielding action-sensitive world models that improve planning on four visual continuous-control tasks over JEPA baselines.
ScaleAware-JEPA: Latent Representation for Discovery in Multiscale Physical Fields cs.LG · 2026-06-29 · unverdicted · none · ref 7 · internal anchor
ScaleAware-JEPA combines Constrained Diffusion Decomposition with a scale-tied JEPA objective to learn label-free latent coordinates that recover coherent morphology in multiscale fields such as MHD turbulence and interstellar gas.
Flow Matching in Feature Space for Stochastic World Modeling cs.CV · 2026-06-27 · unverdicted · none · ref 3 · internal anchor
FlowWM applies flow matching directly in pretrained feature space with a one-step projection mechanism, improving perception accuracy, mode coverage, and horizon robustness on synthetic and real-world benchmarks.
TacForeSight: Force-Guided Tactile World Model for Contact-Rich Manipulation cs.RO · 2026-06-09 · unverdicted · none · ref 33 · internal anchor
TacForeSight trains a force-conditioned tactile world model to predict latent dynamics and uses those predictions as anticipatory priors inside a visuo-tactile policy for real-time contact-rich manipulation.
$\omega$-EVA: Envision, Verify, and Act with Latent Interactive World Models cs.RO · 2026-06-08 · unverdicted · none · ref 26 · internal anchor
ω-EVA is a three-stage latent world model framework that trains action-conditioned dynamics, a language-conditioned flow policy, and a tri-branch refiner to improve embodied action generation in simulation.
FF-JEPA: Long-Horizon Planning in World Models with Latent Planners cs.AI · 2026-06-08 · unverdicted · none · ref 6 · internal anchor
FF-JEPA introduces a two-model hierarchical structure with an action-free latent planner to decompose long-horizon planning into short subgoals in latent world models.
Dexterity-BEV: Aligning 3D World and Actions for Generalizable Robot Policies Learning cs.RO · 2026-06-01 · unverdicted · none · ref 10 · internal anchor
Dexterity-BEV creates 3D vertex-based inputs and BEV-aligned outputs to reduce spatial-temporal misalignments in end-to-end robot policies trained on diverse datasets and embodiments.
Beyond Task Success: Behavioral and Representational Diagnostics for WAM and VLA cs.RO · 2026-05-31 · unverdicted · none · ref 24 · internal anchor
Empirical study introduces behavioral and representational diagnostics showing architecture-dependent gains in object targeting and predictive structure for WAMs over VLAs on LIBERO and RoboTwin2.0.
StressDream: Steering Video World Models for Robust Policy Evaluation and Improvement cs.CV · 2026-05-29 · unverdicted · none · ref 67 · internal anchor
StressDream optimizes initial noise in diffusion video world models using VLM semantic and plausibility objectives to steer generations toward specified high-impact outcomes for improved policy evaluation.
LaMo: Self-Supervised Latent Motion Priors for Physical Realism in Video Generation cs.CV · 2026-05-22 · unverdicted · none · ref 15 · internal anchor
LaMo adds self-supervised latent motion priors via a motion drift loss during training and motion prior guidance during sampling to boost physical fidelity in video diffusion models like CogVideoX.
Crys-JEPA: Accelerating Crystal Discovery via Embedding Screening and Generative Refinement cs.LG · 2026-05-14 · unverdicted · none · ref 30 · internal anchor
Crys-JEPA introduces a joint embedding predictive architecture that creates an energy-aware latent space, enabling embedding-based stability screening and a refinement pipeline that yields up to 72.7% gains on the V.S.U.N. metric for crystal generation.
SCAR: Self-Supervised Continuous Action Representation Learning cs.RO · 2026-05-13 · unverdicted · none · ref 25 · internal anchor
SCAR proposes a joint inverse-forward dynamics framework to learn transferable continuous action representations across embodiments from visual data using regularization and adversarial invariance.
Do multimodal models imagine electric sheep? cs.CV · 2026-05-10 · conditional · none · ref 38 · internal anchor
Fine-tuning VLMs to output action sequences for puzzles causes emergent internal visual representations that improve performance when integrated into reasoning.
Predictive but Not Plannable: RC-aux for Latent World Models cs.LG · 2026-05-08 · unverdicted · none · ref 29 · internal anchor
RC-aux corrects spatiotemporal mismatch in reconstruction-free latent world models by adding multi-horizon prediction and reachability supervision, improving planning performance on goal-conditioned pixel-control tasks.
AeroJEPA: Learning Semantic Latent Representations for Scalable 3D Aerodynamic Field Modeling cs.LG · 2026-05-07 · unverdicted · none · ref 6 · internal anchor
AeroJEPA applies joint-embedding predictive learning to produce scalable, semantically organized latent representations for 3D aerodynamic fields that support both field reconstruction and downstream design tasks.
Information bottleneck for learning the phase space of dynamics from high-dimensional experimental data physics.data-an · 2026-04-27 · unverdicted · none · ref 41 · internal anchor
DySIB recovers the two-dimensional phase space of a physical pendulum from experimental video by optimizing a symmetric information bottleneck objective entirely in latent space.
Sonata: A Hybrid World Model for Inertial Kinematics under Clinical Data Scarcity cs.LG · 2026-04-20 · unverdicted · none · ref 24 · internal anchor
Sonata is a small hybrid world model pre-trained to predict future IMU states that outperforms autoregressive baselines on clinical discrimination, fall-risk prediction, and cross-cohort transfer while fitting on-device wearables.
Metriplector: From Field Theory to Neural Architecture cs.AI · 2026-03-31 · unverdicted · none · ref 8 · internal anchor
Metriplector treats neural computation as coupled metriplectic field dynamics whose stress-energy tensor readout achieves competitive results on vision, control, Sudoku, language modeling, and pathfinding with small parameter counts.
Video Generation Models as World Models: Efficient Paradigms, Architectures and Algorithms eess.IV · 2026-03-30 · unverdicted · none · ref 224 · internal anchor
Video generation models can function as world simulators if efficiency gaps in spatiotemporal modeling are bridged via organized paradigms, architectures, and algorithms.
Valdi: Value Diffusion World Models cs.LG · 2026-07-01 · unverdicted · none · ref 19 · internal anchor
Valdi pairs a latent diffusion dynamics model with end-to-end MPC training and reports that one diffusion step matches an MLP baseline on CarRacing while exposing a multimodality-control trade-off.
AdaJEPA: An Adaptive Latent World Model cs.LG · 2026-06-30 · unverdicted · none · ref 44 · internal anchor
AdaJEPA performs closed-loop test-time adaptation of latent world models during MPC by executing an action chunk, observing the transition, and taking one gradient step on the model before replanning, yielding higher goal-reaching success.
How Should World Models Be Evaluated for Embodied Decision-Making? A Decision-Making-Centric Position cs.LG · 2026-06-13 · unverdicted · none · ref 43 · internal anchor
The paper proposes an L0-L7 evidential ladder for evaluating world models in embodied decision-making, prioritizing interventional action fidelity and policy optimization utility over visual plausibility.
PLUME: Probabilistic Latent Unified World Modeling and Parameter Estimation for Multi-Finger Manipulation cs.RO · 2026-06-09 · unverdicted · none · ref 12 · internal anchor
PLUME jointly models parameter beliefs and conditioned dynamics in a latent space for dexterous manipulation, enabling zero-shot sim-to-real transfer that outperforms offline RL and behavior cloning baselines on turning, lifting, and flicking tasks.
PRISM: PRior-guided Imagination Sampling in world Models cs.RO · 2026-06-06 · unverdicted · none · ref 8 · internal anchor
PRISM derives a state-conditioned action prior from a world model's encoder and integrates it into sampling-based planning via product-of-Gaussians fusion, claiming 35 and 32 percentage point gains on Cube and PushT tasks.
Generalization of World Models under Environmental Variability for Vision-based Quadrotor Navigation cs.RO · 2026-06-03 · unverdicted · none · ref 8 · internal anchor
Robustness of world models during cross-environment SSL pretraining predicts sim-to-real transfer success for quadrotor navigation, with discrete latent size and training sequence length as dominant factors.
IMWM: Intuition Models Complement World Models for Latent Planning cs.LG · 2026-06-01 · unverdicted · none · ref 1 · internal anchor
IMWM combines a world model with an intuition model from demonstrations to improve sample-based latent planning success rates over world-model-only baselines on pixel control tasks.
Beyond Euclidean Proximity: Repairing Latent World Models with Horizon-Matched Trajectory Reachability Metrics cs.LG · 2026-05-21 · unverdicted · none · ref 9 · internal anchor
TRM trains a small horizon-matched pairwise head on trajectory data to improve terminal-state ranking in latent MPC, raising success from 7% to 97% on TwoRoom and 32.7% to 84% on PLDM without changing the encoder or dynamics.
ChronoMedicalWorld: A Medical World Model for Learning Patient Trajectories from Longitudinal Care Data cs.LG · 2026-05-21 · unverdicted · none · ref 15 · internal anchor
CMWM is a recurrent latent world model for forecasting patient trajectories like annual eGFR in CKD, reporting 7.28% lower MAE than a tuned GPT-5.5 baseline on a 2232-patient cohort with gains from dialogue data.
stable-worldmodel: A Platform for Reproducible World Modeling Research and Evaluation cs.LG · 2026-05-20 · unverdicted · none · ref 21 · internal anchor
The paper presents stable-worldmodel (swm), a platform with high-performance data layer, modern world model baselines, planning solvers, and extended environments for reproducible research and generalization evaluation.
Pelican-Unify 1.0: A Unified Embodied Intelligence Model for Understanding, Reasoning, Imagination and Action cs.RO · 2026-05-14 · unverdicted · none · ref 31 · 2 links · internal anchor
A unified embodied foundation model uses one VLM for understanding and reasoning plus a joint video-action future generator, reporting competitive scores on VLM, world modeling, and robot benchmarks without apparent compromise.
Is the Future Compatible? Diagnosing Dynamic Consistency in World Action Models cs.RO · 2026-05-08 · unverdicted · none · ref 28 · internal anchor
Action-state consistency in World Action Models distinguishes successful from failed imagined futures and supports value-free selection of better rollouts via consensus among predictions.
ST-Gen4D: Embedding 4D Spatiotemporal Cognition into World Model for 4D Generation cs.CV · 2026-05-08 · unverdicted · none · ref 20 · internal anchor
ST-Gen4D uses a world model that fuses global appearance and local dynamic graphs into a 4D cognition representation to guide consistent 4D Gaussian generation.
Detecting is Easy, Adapting is Hard: Local Expert Growth for Visual Model-Based Reinforcement Learning under Distribution Shift cs.LG · 2026-04-30 · unverdicted · none · ref 7 · internal anchor
JEPA-Indexed Local Expert Growth adds local action corrections for detected shift clusters and yields statistically significant OOD gains on four shift conditions while keeping in-distribution performance intact.
Towards World Models in Biomedical Research cs.AI · 2026-06-04 · unverdicted · none · ref 48 · internal anchor
Proposes biomedical world models that learn latent states and intervention-conditioned dynamics to enable simulation of future biological trajectories for discovery in virtual cells, organoids, patients, and surgery.
WALL-WM: Carving World Action Modeling at the Event Joints cs.RO · 2026-06-01 · unverdicted · none · ref 54 · internal anchor
WALL-WM introduces event-grounded Vision-Language-Action pretraining that uses semantic events as the atomic unit to address granularity mismatch in world action models and reports state-of-the-art generalization.

LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer