WorldLens: Full-Spectrum Evaluations of Driving World Models in Real World

Ao Liang; Benoit R. Cottereau; Changxin Gao; Dekai Zhu; Dongyue Lu; Guangfeng Jiang; Hongsi Liu; Jialong Zuo; Lai Xing Ng; Liang Pan

arxiv: 2512.10958 · v2 · pith:5PSTYZAHnew · submitted 2025-12-11 · 💻 cs.CV

WorldLens: Full-Spectrum Evaluations of Driving World Models in Real World

Ao Liang , Lingdong Kong , Tianyi Yan , Hongsi Liu , Wesley Yang , Ziqi Huang , Wei Yin , Jialong Zuo

show 14 more authors

Yixuan Hu Dekai Zhu Dongyue Lu Youquan Liu Guangfeng Jiang Linfeng Li Xiangtai Li Long Zhuo Lai Xing Ng Benoit R. Cottereau Changxin Gao Liang Pan Wei Tsang Ooi Ziwei Liu

This is my paper

classification 💻 cs.CV

keywords worldmodelmodelsrealbenchmarkdatasetdrivingfidelity

0 comments

read the original abstract

Generative world models are reshaping embodied AI, enabling agents to synthesize realistic 4D driving environments that look convincing but often fail physically or behaviorally. Despite rapid progress, the field still lacks a unified way to assess whether generated worlds preserve geometry, obey physics, or support reliable control. We introduce WorldLens, a full-spectrum benchmark evaluating how well a model builds, understands, and behaves within its generated world. It spans five aspects -- Generation, Reconstruction, Action-Following, Downstream Task, and Human Preference -- jointly covering visual realism, geometric consistency, physical plausibility, and functional reliability. Across these dimensions, no existing world model excels universally: those with strong textures often violate physics, while geometry-stable ones lack behavioral fidelity. To align objective metrics with human judgment, we further construct WorldLens-26K, a large-scale dataset of human-annotated videos with numerical scores and textual rationales, and develop WorldLens-Agent, an evaluation model distilled from these annotations to enable scalable, explainable scoring. Together, the benchmark, dataset, and agent form a unified ecosystem for measuring world fidelity -- standardizing how future models are judged not only by how real they look, but by how real they behave.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

From Articulated Kinematics to Routed Visual Control for Action-Conditioned Surgical Video Generation
cs.CV 2026-05 unverdicted novelty 7.0

A kinematic-to-visual lifting paradigm combined with hierarchically routed control generates action-conditioned surgical videos with better faithfulness, fidelity, and efficiency.
OmniLiDAR: A Unified Diffusion Framework for Multi-Domain 3D LiDAR Generation
cs.CV 2026-05 conditional novelty 6.0

A unified text-conditioned diffusion model generates high-fidelity LiDAR scans across eight domains spanning weather, sensor, and platform shifts using cross-domain training and feature modeling.
Human Cognition in Machines: A Unified Perspective of World Models
cs.RO 2026-04 unverdicted novelty 6.0

The paper introduces a unified framework for world models that fully incorporates all cognitive functions from Cognitive Architecture Theory, highlights under-researched areas in motivation and meta-cognition, and pro...