Recognition: unknown
OVPD: A Virtual-Physical Fusion Testing Dataset of OnSite Auton-omous Driving Challenge
Pith reviewed 2026-05-10 00:13 UTC · model grok-4.3
The pith
OVPD fuses virtual background traffic with real-vehicle-in-the-loop testing to create controllable closed-loop environments on a proving ground.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
OVPD is a virtual-physical fusion testing dataset that integrates virtual background traffic with vehicle-infrastructure perception to build controllable and interactive closed-loop test environments on a proving ground, delivering 20 testing clips totaling nearly three hours of multi-modal data including vehicle trajectories and states, control commands, and digital-twin-rendered surround-view observations.
What carries the argument
The vehicle-in-the-loop framework that merges real vehicle states and control commands with digital-twin-rendered virtual traffic and infrastructure perception to produce interactive closed-loop scenarios.
If this is right
- Enables long-tail planning and decision-making validation in interactive settings
- Supports both open-loop and platform-enabled closed-loop evaluation
- Allows comprehensive assessment across safety, efficiency, comfort, rule compliance, and traffic impact
- Supplies traceable data for diagnosing specific failures and guiding iterative algorithm improvements
Where Pith is reading between the lines
- The same fusion method could be applied to test interactions with live traffic infrastructure outside controlled proving grounds
- OVPD clips could serve as a benchmark to measure how well purely simulated environments predict real-vehicle outcomes
- Future challenges might expand the atomic scenario set using the same virtual-physical template to cover additional edge cases
Load-bearing premise
Existing public datasets lack sufficient real vehicle dynamics feedback and closed-loop interaction, and the virtual-physical fusion described here fills that gap without introducing new biases or limitations.
What would settle it
A side-by-side comparison in which the same autonomous driving stack is run on OVPD clips and then on equivalent real-world road segments without the virtual-physical setup, checking whether failure modes and performance rankings match.
Figures
read the original abstract
The rapid iteration of autonomous driving algorithms has created a growing demand for high-fidelity, replayable, and diagnosable testing data. However, many public datasets lack real vehicle dynamics feedback and closed-loop interaction with surrounding traffic and road infrastructure, limiting their ability to reflect deployment readiness. To address this gap, we present OVPD (OnSite Virtual-Physical Dataset), a virtual-physical fusion testing dataset released from the 2025 OnSite Autonomous Driving Challenge. Centered on real-vehicle-in-the-loop testing, OVPD integrates virtual background traffic with vehicle-infrastructure perception to build controllable and interactive closed-loop test environments on a proving ground. The dataset contains 20 testing clips from 20 teams over a scenario chain of 15 atomic scenarios, totaling nearly 3 hours of multi-modal data, including vehicle trajectories and states, control commands, and digital-twin-rendered surround-view observations. OVPD supports long-tail planning and decision-making validation, open-loop or platform-enabled closed-loop evaluation, and comprehensive assessment across safety, efficiency, comfort, rule compliance, and traffic impact, providing actionable evidence for failure diagnosis and iterative improvement. The dataset is available via: https://huggingface.co/datasets/Yuhang253820/Onsite_OPVD
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents OVPD (OnSite Virtual-Physical Dataset), a virtual-physical fusion testing dataset released from the 2025 OnSite Autonomous Driving Challenge. It claims to integrate virtual background traffic with real vehicle-in-the-loop testing and vehicle-infrastructure perception to create controllable, interactive closed-loop test environments on a proving ground. The dataset includes 20 testing clips from 20 teams spanning 15 atomic scenarios, totaling nearly 3 hours of multi-modal data (vehicle trajectories/states, control commands, and digital-twin-rendered surround-view observations). OVPD is positioned to enable long-tail planning validation, open- or closed-loop evaluation, and multi-metric assessment across safety, efficiency, comfort, rule compliance, and traffic impact, with the data publicly available on HuggingFace.
Significance. If the virtual-physical fusion and data quality are as described and rigorously ensured, OVPD would represent a meaningful addition to autonomous driving testing resources by supplying replayable, diagnosable closed-loop scenarios that incorporate real vehicle dynamics feedback—features often absent from purely virtual or open-loop public datasets. The multi-metric evaluation framework and public release support iterative algorithm development and failure analysis. The dataset's origin in a competitive challenge setting adds practical relevance for deployment-oriented validation.
major comments (2)
- [Abstract] Abstract: The central claim that OVPD 'integrates virtual background traffic with vehicle-infrastructure perception to build controllable and interactive closed-loop test environments' and 'provides actionable evidence for failure diagnosis' is load-bearing, yet the abstract (and by extension the manuscript) provides no quantitative validation, error analysis, synchronization metrics, or details on how data quality and fusion fidelity were ensured. This absence undermines assessment of whether the dataset truly fills the stated gap without introducing new biases or limitations in dynamics feedback.
- [Dataset description] Dataset description section: The composition (20 clips, 15 atomic scenarios, nearly 3 hours of multi-modal data) is outlined, but without accompanying statistics on scenario distribution, coverage of long-tail cases, or validation of closed-loop interaction (e.g., latency between virtual traffic and physical vehicle responses), the utility for comprehensive multi-metric assessment and iterative improvement cannot be verified.
minor comments (3)
- [Introduction] The abstract references limitations in 'many public datasets' but does not cite specific examples or provide a comparison table; adding targeted references and a brief comparison in the introduction would strengthen the motivation.
- [Dataset availability] Clarify the data format, access instructions, and any required platform setup for closed-loop evaluation in the dataset documentation or a dedicated usage section, as the HuggingFace link alone may not suffice for immediate reproducibility.
- [Figures/Tables] Consider including example visualizations or summary statistics (e.g., a table of scenario types and clip durations) to illustrate the 20 clips and rendered observations for readers.
Simulated Author's Rebuttal
We thank the referee for the positive recommendation of minor revision and the constructive feedback on strengthening the presentation of OVPD. We address each major comment below with clarifications and planned revisions.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim that OVPD 'integrates virtual background traffic with vehicle-infrastructure perception to build controllable and interactive closed-loop test environments' and 'provides actionable evidence for failure diagnosis' is load-bearing, yet the abstract (and by extension the manuscript) provides no quantitative validation, error analysis, synchronization metrics, or details on how data quality and fusion fidelity were ensured. This absence undermines assessment of whether the dataset truly fills the stated gap without introducing new biases or limitations in dynamics feedback.
Authors: We agree that the abstract would be strengthened by referencing the quality assurance steps. The manuscript describes the virtual-physical integration through the vehicle-in-the-loop setup on the proving ground and digital-twin rendering, but does not include quantitative metrics such as synchronization error or latency bounds. We will revise the abstract to note the challenge-based quality controls (e.g., real-time trajectory consistency and visual rendering checks) and add a short paragraph in the dataset section summarizing these measures. Detailed error analysis and cross-modal synchronization statistics were not performed as part of the dataset release and would require additional post-processing. revision: partial
-
Referee: [Dataset description] Dataset description section: The composition (20 clips, 15 atomic scenarios, nearly 3 hours of multi-modal data) is outlined, but without accompanying statistics on scenario distribution, coverage of long-tail cases, or validation of closed-loop interaction (e.g., latency between virtual traffic and physical vehicle responses), the utility for comprehensive multi-metric assessment and iterative improvement cannot be verified.
Authors: We will include a table in the revised manuscript showing the distribution of the 15 atomic scenarios across the 20 clips to illustrate coverage. The scenarios were selected from the challenge to include long-tail elements such as rare interactions and edge cases. The closed-loop nature is provided by the real vehicle dynamics feedback in the vehicle-in-the-loop testing; however, explicit latency measurements between virtual traffic updates and physical responses were not logged during data collection. revision: partial
- Quantitative synchronization metrics, error bounds, and latency values between virtual traffic and physical vehicle responses are not available from the original challenge data collection and cannot be supplied without new analysis.
Circularity Check
No significant circularity detected
full rationale
The paper is a dataset release describing OVPD contents (20 clips, multi-modal trajectories, controls, rendered observations) and uses for closed-loop testing. No equations, derivations, fitted parameters, or predictions appear in the abstract or described structure. Central claims rest on direct description of the released data and challenge context rather than any self-referential reduction or self-citation chain.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Zhang 1, Jiarui
1 OVPD: A Virtual-Physical Fusion Testing Dataset of OnSite Au- tonomous Driving Challenge Yuhang. Zhang 1, Jiarui. Zhang 1, Bowen. Jian 1, Xin. Zhou 1, Zhichao. Lv 1, Peng. Hang 1, Rongjie. Yu 1, Ye. Tian1 and Jian. Sun1,✉ 1 Introduction Autonomous driving is evolving from driver assistance toward L3+ autonomy, which imposes stricter requirements on comp...
2021
-
[2]
over, route-level aggregation makes it difficult to localize capabil-ity deficits at the scenario level (Ericsson, 2000)
A few exemplary visualizations of the OVPD dataset: (a) a roundabout scenario, highlighting yielding and interaction with circulating traffic; (b) a merge scenario, focusing on gap selection and coordination with surrounding vehicles; (c) an obstacle-avoidance scenario in a construction zone; and (d) an unsignalized intersection scenario, emphasizing righ...
2000
-
[3]
Real vehicle dynamics constraints in the data: OVPD is col-lected in a virtual--physical fusion testing environment centered on real-vehicle-in-the-loop execution. The ego vehicle interacts with virtual background traffic while producing real state, tra-jectory, and control data, enabling the dataset to capture realistic dynamic constraints for model trai...
2020
-
[4]
Large-scale datasets such as Waymo (Ettinger et al., 2021; Xu et al.,
represented by nuScenes typically evaluate planned trajectories 3 through regression errors against logged ground truth (e.g., L2 er-ror), offering advantages in implementation simplicity and com-putational efficiency. Large-scale datasets such as Waymo (Ettinger et al., 2021; Xu et al.,
2021
-
[5]
However, despite their large scale and broad coverage, existing autonomous driving datasets are still predominantly designed for modeling or general benchmark evaluation
further provide extensive sce-nario diversity and rich multi-agent motion data, forming an im-portant foundation for perception, prediction, and planning re-search. However, despite their large scale and broad coverage, existing autonomous driving datasets are still predominantly designed for modeling or general benchmark evaluation. There remains a lack ...
2024
-
[6]
further targets end-to-end closed-loop evaluation and introduces comfort- and efficiency-related metrics to characterize driving experience. Nevertheless, such benchmarks still provide insufficient coverage of deployment-critical dimensions such as emergency risk handling, traffic-rule compliance, and interaction coordination (Jiang et al., 2025; Kurenkov...
2025
-
[7]
underlying OVPD constructs a virtual–physical fusion environment based on digital twin technology. By integrating high-definition maps, roadside perception, virtual background traffic flow, and real vehicle control, it enables real-time closed-loop interaction between virtual and physical traffic participants. The platform can not only generate and collec...
2025
-
[8]
system, and are integrated with the onboard percep-tion-fusion capability to form a complete testing toolchain that sup-ports Onsite scenario execution and OPVD data logging. The ego vehicle's control commands and motion states are recorded at 10~Hz: high-level planning/decision modules output control com-mands, which are executed through the onboard cont...
2024
-
[9]
OVPD assigns a maximum comfort score of 10 and applies a threshold-based scheme that penalizes the fraction of time exceed-ing any comfort threshold until the score is exhausted
Comfort: Comfort is closely tied to passenger experience and is primarily determined by longitudinal/lateral acceleration, jerk, and vehicle attitude changes (e.g., yaw rate) (De Winkel et al., 2023). OVPD assigns a maximum comfort score of 10 and applies a threshold-based scheme that penalizes the fraction of time exceed-ing any comfort threshold until t...
2023
-
[10]
In OVPD, traffic coordination has a maximum of 10 points and is re-ported as 𝐺coord∈[0,10]
Traffic Coordination: Traffic coordination measures how the ego vehicle affects the mobility of surrounding participants and the stability of local traffic flow (Ma et al., 2023; Yu et al., 2021). In OVPD, traffic coordination has a maximum of 10 points and is re-ported as 𝐺coord∈[0,10]. We identify the set of affected back-ground vehicles using a DBSCAN-...
2023
-
[11]
Rule Compliance: Compliant driving is a prerequisite for de-ploying autonomous vehicles on public roads (Liu et al., 2025; Ma-nas and Paschke, 2023). Following common high-frequency viola-tion types on urban roads in mainland China, OVPD performs event-level penalties over the following categories: Signal-related violations: e.g., running a red light, ent...
2025
-
[12]
On-Vehicle Deployment and Closed-Loop Execution In OVPD physical testing, the evaluated algorithm can be de-ployed via a standardized engineering framework onto an L4-ca-pable autonomous vehicle equipped with a multi-sensor suite and a drive-by-wire chassis. The on-vehicle middleware uses CyberRT (Baidu Apollo Team, 2019), through which the algorithm rece...
2019
-
[13]
OVPD Baseline leaderboard (real-vehicle in-the-loop) Team Overall Safe Eff Comf Comp Coord Pass T11 89.44 100.00 87.39 94.18 96.00 81.34 1.00 T17 89.42 100.00 84.58 97.81 100.00 85.24 1.00 T14 89.29 100.00 85.10 96.69 100.00 85.13 1.00 T16 88.38 100.00 83.87 96.19 99.33 83.99 1.00 T6 85.61 100.00 84.59 88.22 89.67 84.67 1.00 T15 83.98 93.33 80.30 91.85 92...
2025
-
[14]
As shown in Fig
Scenario diversity supports layered dataset structure:Within the 15-scenario Onsite baseline, OVPD includes both foundational scenarios and more discriminative ones. As shown in Fig. 6, some scenarios present relatively concentrated metric distributions, re-flecting basic capability checks with clear task requirements, while others show broader variation ...
2025
-
[15]
Apollo: An open-source runtime frame-work for autonomous driving. GitHub. Accessed: 2024-03-19. Caesar, H., Bankiti, V., Lang, A. H., Vora, S., Liong, V. E., Xu, Q., et al.,
2024
-
[16]
arXiv preprint arXiv:2512.07507
VP-AutoTest: A virtual-physical fusion autonomous driving testing platform. arXiv preprint arXiv:2512.07507. Daza, I. G., Izquierdo, R., Martinez, L. M., Benderius, O., Llorca, D. F.,
-
[17]
Proceedings of the Driving Simu-lation Conference Europe 2010, pp
OpenDRIVE 2010 and beyond: Status and future of the de facto standard for the description of road networks. Proceedings of the Driving Simu-lation Conference Europe 2010, pp. 231–242. Ericsson, E.,
2010
-
[18]
Baidu Apollo EM Motion Planner
Baidu Apollo EM motion planner. arXiv preprint arXiv:1807.08048. Gao, S., Yang, J., Chen, L., Chitta, K., Qiu, Y., Geiger, A., et al.,
-
[19]
Inter- active adversarial testing of autonomous vehicles with adjustable confrontation intensity,
Interactive adversarial testing of autonomous vehicles with adjustable con-frontation intensity. arXiv preprint arXiv:2507.21814. Hafner, D., Pasukonis, J., Ba, J., Lillicrap, T.,
-
[20]
Accessed: 2026-02-26
TESSNG: Traffic simulation software (official web-site). Accessed: 2026-02-26. Karnchanachari, N., Geromichalos, D., Tan, K. S., Li, N., Eriksen, C., Yaghoubi, S., et al.,
2026
-
[21]
2024 IEEE International Conference on Robotics and Automation, pp
Towards learning-based planning: The nuPlan benchmark for real-world autonomous driving. 2024 IEEE International Conference on Robotics and Automation, pp. 629–636. Kerbl, B., Kopanas, G., Leimkuhler, T., Drettakis, G.,
2024
-
[22]
Sadigh, D., Sastry, S., Seshia, S
The 2025 Onsite autonomous driving connected joint testing real-vehicle competition concludes successfully, marking new break-throughs in autonomous driving evaluation research. Sadigh, D., Sastry, S., Seshia, S. A., Dragan, A. D.,
2025
-
[23]
IEEE Transactions on Software Engi-neering, 49(4), 1928–1940
Mind the gap! A study on the transferability of virtual versus physical-world testing of auton-omous driving systems. IEEE Transactions on Software Engi-neering, 49(4), 1928–1940. Stocco, A., Pulfer, B., Tonella, P.,
1928
-
[24]
IEEE Transactions on Software Engi-neering, 49, 1928–1940
Mind the gap! A study on the transferability of virtual versus physical-world testing of auton-omous driving systems. IEEE Transactions on Software Engi-neering, 49, 1928–1940. Sun, H., Feng, S., Yan, X., Liu, H. X.,
1928
-
[25]
Re-thinking the open-loop evaluation of end-to-end autonomous driving in nuScenes. arXiv preprint arXiv:2305.10430. Zhang, S., Chen, Q., Zhang, X., Qiu, J., Li, X., Li, Y., et al.,
-
[26]
2025 IEEE International Con-ference on Robotics and Automation, pp
Intelligence evaluation methods for autonomous vehicles. 2025 IEEE International Con-ference on Robotics and Automation, pp. 10600–10606. Zhou, H., Liu, H., Lu, H., Ma, J., Ji, Y.,
2025
-
[27]
2024 IEEE International Conference on Robotics and Bi-omimetics, pp
Enhance planning with physics-informed safety controller for end-to-end autonomous driving. 2024 IEEE International Conference on Robotics and Bi-omimetics, pp. 1775–1782. Author biography Yuhang Zhang received the B.S. degree in College of Transportation from Tongji University, Shanghai, China. He is currently pursuing the M.S. degree with the Department...
2024
-
[28]
de-gree with the Department of Traffic En-gineering, Tongji University, Shanghai, China
He is currently pursuing the Ph.D. de-gree with the Department of Traffic En-gineering, Tongji University, Shanghai, China. His main research interests in-clude law-compliance testing and en-hancement for autonomous vehicles. Xin Zhou received the B.S. degree in transportation engineering from Tongji University, Shanghai, China. He is cur-rently pursuing ...
2010
-
[29]
His research interests include vehicle dynamics and control, decision making, motion plan-ning and motion control for autonomous vehicles
From 2020 to 2022, he served as a Research Fellow with the School of Mechanical and Aerospace Engineering, Nanyang Technolog-ical University, Singapore. His research interests include vehicle dynamics and control, decision making, motion plan-ning and motion control for autonomous vehicles. He serves as an Associate Editor of IEEE Internet of Things Journ...
2020
-
[30]
His main research inter-ests include traffic flow theory, traf-fic simulation, connected vehicleinfrastructure system, and intelligent transportation system
Subsequently, he was at Tongji University as a Lec-turer, and then promoted to the posi-tion as a Professor in 2011, where he is currently a Professor with the Col-lege of Transportation Engineering and the Dean of the Department of Traffic Engineering. His main research inter-ests include traffic flow theory, traf-fic simulation, connected vehicleinfrast...
2011
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.