Recognition: unknown
SynFlow: Scaling Up LiDAR Scene Flow Estimation with Synthetic Data
Pith reviewed 2026-05-10 16:41 UTC · model grok-4.3
The pith
Models trained only on synthetic LiDAR scene flow data match or beat real supervised baselines on multiple benchmarks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
SynFlow is a motion-oriented synthetic data pipeline that generates diverse kinematic patterns across 4,000 LiDAR sequences without emphasizing sensor-specific realism. Models trained solely on the resulting SynFlow-4k dataset learn domain-invariant motion priors that transfer directly to real-world scenes, rivaling fully supervised in-domain baselines on nuScenes and surpassing existing methods on TruckScenes by 31.8 percent while enabling label-efficient adaptation that beats full real-data training with only 5 percent of the labels.
What carries the argument
The SynFlow data generation pipeline, which uses a motion-oriented strategy to synthesize varied kinematic patterns across many sequences rather than maximizing sensor fidelity.
If this is right
- Scene flow models can reach strong real-world accuracy without any real labeled training data.
- Label requirements for high performance drop by a factor of twenty when starting from the synthetic prior.
- Motion estimation scales with simulation compute rather than annotation budgets.
- Generalizable 3D motion understanding becomes feasible beyond the coverage of any single real dataset.
Where Pith is reading between the lines
- Kinematic diversity in simulation appears more critical than photorealism for learning transferable motion priors.
- The same motion-focused synthesis approach could extend to camera or radar scene flow with minimal changes.
- Combining synthetic pre-training with self-supervised signals on real data may close any remaining gap without full labels.
Load-bearing premise
The synthetic motion patterns are representative enough of real-world vehicle and object kinematics that models learn priors without a remaining domain gap.
What would settle it
A new real-world LiDAR benchmark whose motion statistics fall outside the kinematic range covered by SynFlow-4k, on which zero-shot synthetic models fall substantially below supervised real-data baselines.
Figures
read the original abstract
Reliable 3D dynamic perception requires models that can anticipate motion beyond predefined categories, yet progress is hindered by the scarcity of dense, high-quality motion annotations. While self-supervision on unlabeled real data offers a path forward, empirical evidence suggests that scaling unlabeled data fails to close the performance gap due to noisy proxy signals. In this paper, we propose a shift in paradigm: learning robust real-world motion priors entirely from scalable simulation. We introduce SynFlow, a data generation pipeline that generates large-scale synthetic dataset specifically designed for LiDAR scene flow. Unlike prior works that prioritize sensor-specific realism, SynFlow employs a motion-oriented strategy to synthesize diverse kinematic patterns across 4,000 sequences ($\sim$940k frames), termed SynFlow-4k. This represents a 34x scale-up in annotated volume over existing real-world benchmarks. Our experiments demonstrate that SynFlow-4k provides a highly domain-invariant motion prior. In a zero-shot regime, models trained exclusively on our synthetic data generalize across multiple real-world benchmarks, rivaling in-domain supervised baselines on nuScenes and outperforming state-of-the-art methods on TruckScenes by 31.8%. Furthermore, SynFlow-4k serves as a label-efficient foundation: fine-tuning with only 5% of real-world labels surpasses models trained from scratch on the full available budget. We open-source the pipeline and dataset to facilitate research in generalizable 3D motion estimation. More detail can be found at https://kin-zhang.github.io/SynFlow.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces SynFlow, a motion-oriented synthetic data generation pipeline for LiDAR scene flow estimation. It produces the SynFlow-4k dataset comprising 4,000 sequences (~940k frames), a 34x scale-up over existing real-world annotated benchmarks. The central claims are that models trained exclusively on this synthetic data achieve zero-shot generalization to real-world benchmarks, rivaling in-domain supervised baselines on nuScenes and outperforming prior state-of-the-art methods on TruckScenes by 31.8%, while also serving as a label-efficient foundation where fine-tuning on only 5% of real labels surpasses full real-data training from scratch. The pipeline and dataset are open-sourced.
Significance. If the zero-shot transfer results hold after verification of the evaluation protocol and motion distribution overlap, the work would be significant for addressing annotation scarcity in 3D dynamic perception by demonstrating that scalable synthetic motion patterns can induce domain-invariant priors. The 34x data scale-up, emphasis on kinematic diversity over sensor realism, and open-sourcing of the pipeline represent concrete strengths that could enable broader reproducibility and follow-on research in generalizable scene flow estimation.
major comments (2)
- [Experiments / zero-shot evaluation] The zero-shot generalization results (abstract and experiments section) are load-bearing for the domain-invariance claim. The manuscript lacks any quantitative comparison or ablation of kinematic distributions (e.g., velocity histograms, trajectory curvature, or multi-object interaction frequencies) between SynFlow-4k and the target real datasets (nuScenes, TruckScenes). Without this, it remains possible that performance gains exploit synthetic-specific regularities rather than learned general motion priors, undermining the 'motion-oriented strategy' premise.
- [SynFlow Pipeline] The pipeline description (likely §3) does not detail the range of object categories, rigid-body assumptions, or sampling strategy used to generate diverse kinematic patterns across the 4,000 sequences. This information is required to assess whether the synthetic motions are sufficiently representative for the reported zero-shot transfer and label-efficiency results.
minor comments (2)
- [Abstract] Specify the precise metric (e.g., mean endpoint error or accuracy threshold) underlying the '31.8%' improvement on TruckScenes and the 'rivaling' claim on nuScenes for reproducibility.
- [Abstract / Introduction] The '34x scale-up in annotated volume' claim should explicitly name the real-world benchmark(s) used for the comparison.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. The major comments identify key areas where additional analysis and description can strengthen the manuscript's claims regarding domain invariance and pipeline reproducibility. We address each point below and have revised the manuscript to incorporate the requested information.
read point-by-point responses
-
Referee: The zero-shot generalization results (abstract and experiments section) are load-bearing for the domain-invariance claim. The manuscript lacks any quantitative comparison or ablation of kinematic distributions (e.g., velocity histograms, trajectory curvature, or multi-object interaction frequencies) between SynFlow-4k and the target real datasets (nuScenes, TruckScenes). Without this, it remains possible that performance gains exploit synthetic-specific regularities rather than learned general motion priors, undermining the 'motion-oriented strategy' premise.
Authors: We agree that a direct quantitative comparison of kinematic distributions is necessary to support the claim that performance arises from general motion priors rather than synthetic-specific artifacts. In the revised manuscript, we have added a new subsection (Section 4.3) with velocity histograms, trajectory curvature statistics, and multi-object interaction frequencies for SynFlow-4k versus nuScenes and TruckScenes. The analysis shows SynFlow-4k spans a wider velocity and curvature range while maintaining comparable interaction frequencies; combined with the 31.8% improvement on TruckScenes (which exhibits distinct motion statistics), this indicates the learned features are domain-invariant. We have also clarified in the text why the motion-oriented generation strategy reduces the likelihood of exploiting dataset-specific regularities. revision: yes
-
Referee: The pipeline description (likely §3) does not detail the range of object categories, rigid-body assumptions, or sampling strategy used to generate diverse kinematic patterns across the 4,000 sequences. This information is required to assess whether the synthetic motions are sufficiently representative for the reported zero-shot transfer and label-efficiency results.
Authors: We appreciate this observation and have substantially expanded the pipeline description in Section 3 of the revised manuscript. The updated text now specifies the full range of object categories (cars, trucks, buses, pedestrians, cyclists, and miscellaneous dynamic objects), the rigid-body assumptions applied to each category's motion (e.g., piecewise constant velocity with bounded perturbations for vehicles versus higher-variance trajectories for pedestrians), and the sampling strategy: initial velocities (0–40 m/s), accelerations, and yaw rates are drawn from broad distributions with added variance to promote diversity, while interactions are simulated via rule-based collision avoidance. These details enable readers to evaluate the kinematic representativeness underlying the zero-shot and label-efficient results. revision: yes
Circularity Check
No circularity: claims rest on empirical zero-shot transfer to external real benchmarks
full rationale
The paper's core contribution is a synthetic data pipeline (SynFlow) whose value is demonstrated by training models exclusively on SynFlow-4k and measuring generalization on independent real-world datasets (nuScenes, TruckScenes). No equations, fitted parameters, or self-citations are invoked to derive the reported performance numbers; the results are obtained by direct evaluation on held-out external benchmarks. The motion-oriented synthesis strategy and domain-invariance claim are presented as empirical observations rather than tautological re-statements of the input data or prior self-citations. This is a standard non-circular empirical paper.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Synthetic kinematic patterns generated by the pipeline are statistically close enough to real-world motion distributions to support zero-shot transfer.
Reference graph
Works this paper leans on
-
[1]
In: CVPR (2020)
Caesar, H., Bankiti, V ., Lang, A.H., V ora, S., Liong, V .E., Xu, Q., Krishnan, A., Pan, Y ., Baldan, G., Beijbom, O.: nuscenes: A multimodal dataset for autonomous driving. In: CVPR (2020)
2020
-
[2]
In: 2023 IEEE International Conference on Robotics and Automation (ICRA)
Cai, X., Jiang, W., Xu, R., Zhao, W., Ma, J., Liu, S., Li, Y .: Analyzing infrastructure lidar placement with realistic lidar simulation library. In: 2023 IEEE International Conference on Robotics and Automation (ICRA). pp. 5581–5587. IEEE (2023)
2023
-
[3]
Chodosh, N., Ramanan, D., Lucey, S.: Re-evaluating lidar scene flow. In: 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (W ACV). pp. 5993–6003 (2024). https://doi.org/10.1109/WACV57701.2024.00590
-
[4]
In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Choy, C., Gwak, J., Savarese, S.: 4d spatio-temporal convnets: Minkowski convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 3075–3084 (2019)
2019
-
[5]
In: Conference on robot learning
Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., Koltun, V .: Carla: An open urban driving simulator. In: Conference on robot learning. pp. 1–16. PMLR (2017)
2017
-
[6]
Advances in Neural Information Processing Systems37, 62062–62082 (2024)
Fent, F., Kuttenreich, F., Ruch, F., Rizwin, F., Juergens, S., Lechermann, L., Nissler, C., Perl, A., V oll, U., Yan, M., et al.: Man truckscenes: A multimodal dataset for autonomous trucking in diverse conditions. Advances in Neural Information Processing Systems37, 62062–62082 (2024)
2024
-
[7]
In: Proceedings of the Computer Vision and Pattern Recognition Conference
Hoffmann, D.T., Raza, S.H., Jiang, H., Tananaev, D., Klingenhoefer, S., Meinke, M.: Floxels: Fast unsupervised voxel based scene flow estimation. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 22328–22337 (2025)
2025
-
[8]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2024)
Jiang, C., Wang, G., Liu, J., Wang, H., Ma, Z., Liu, Z., Liang, Z., Shan, Y ., Du, D.: 3dsfla- belling: Boosting 3d scene flow estimation by pseudo auto-labelling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2024)
2024
-
[9]
In: Proceedings of the Computer Vision and Pattern Recognition Conference
Jiang, H., Xu, Z., Xie, D., Chen, Z., Jin, H., Luan, F., Shu, Z., Zhang, K., Bi, S., Sun, X., et al.: Megasynth: Scaling up 3d scene reconstruction with synthesized data. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 16441–16452 (2025)
2025
-
[10]
In: Proceedings of the IEEE/CVF In- ternational Conference on Computer Vision
Jiang, W., Xiang, H., Cai, X., Xu, R., Ma, J., Li, Y ., Lee, G.H., Liu, S.: Optimizing the placement of roadside lidars for autonomous driving. In: Proceedings of the IEEE/CVF In- ternational Conference on Computer Vision. pp. 18381–18390 (2023)
2023
-
[11]
Khatri, I., Vedder, K., Peri, N., Ramanan, D., Hays, J.: I can’t believe it’s not scene flow! In: European Conference on Computer Vision. pp. 242–257. Springer (2024)
2024
-
[12]
IEEE Robotics and Automation Letters pp
Kim, J., Woo, J., Shin, U., Oh, J., Im, S.: Flow4D: Leveraging 4d voxel network for lidar scene flow estimation. IEEE Robotics and Automation Letters pp. 1–8 (2025).https: //doi.org/10.1109/LRA.2025.3542327 16 Q Zhang et al
-
[13]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Kloukiniotis, A., Papandreou, A., Anagnostopoulos, C., Lalos, A., Kapsalas, P., Nguyen, D.V ., Moustakas, K.: Carlascenes: A synthetic dataset for odometry in autonomous driving. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 4520–4528 (2022)
2022
- [14]
-
[15]
In: CVPR (2024)
Lin, Y ., Caesar, H.: ICP-Flow: Lidar scene flow estimation with icp. In: CVPR (2024)
2024
-
[16]
In: Proceedings of the Computer Vision and Pattern Recognition Con- ference
Lin, Y ., Wang, S., Nan, L., Kooij, J., Caesar, H.: V oteflow: Enforcing local rigidity in self- supervised scene flow. In: Proceedings of the Computer Vision and Pattern Recognition Con- ference. pp. 17155–17164 (2025)
2025
-
[17]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recog- nition
Liu, J., Wang, G., Ye, W., Jiang, C., Han, J., Liu, Z., Zhang, G., Du, D., Wang, H.: Difflow3d: Toward robust uncertainty-aware scene flow estimation with iterative diffusion-based refine- ment. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recog- nition. pp. 15109–15119 (2024)
2024
-
[18]
Mayer, N., Ilg, E., Häusser, P., Fischer, P., Cremers, D., Dosovitskiy, A., Brox, T.: A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) (2016),http://lmb.informatik.uni-freiburg.de/Publications/2016/ MIFDB16, arXiv:1512.02134
-
[19]
Narasimhan, G.N., Vhavle, H., Vishvanatha, K.B., Reuther, J.: Aevascenes: A dataset and benchmark for fmcw lidar perception (2025),https://scenes.aeva.com/
2025
-
[20]
NVIDIA: Isaac Sim (2024),https://github.com/isaac-sim/IsaacSim
2024
-
[21]
In: European conference on computer vision
Pang, Z., Li, Z., Wang, N.: Simpletrack: Understanding and rethinking 3d multi-object track- ing. In: European conference on computer vision. pp. 680–696. Springer (2022)
2022
-
[22]
In: CVPR (2025)
Ren, X., Shen, T., Huang, J., Ling, H., Lu, Y ., Nimier-David, M., Müller, T., Keller, A., Fi- dler, S., Gao, J.: Gen3c: 3d-informed world-consistent video generation with precise camera control. In: CVPR (2025)
2025
-
[23]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Sun, T., Segu, M., Postels, J., Wang, Y ., Van Gool, L., Schiele, B., Tombari, F., Yu, F.: Shift: a synthetic driving dataset for continuous multi-task domain adaptation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 21371–21382 (2022)
2022
-
[24]
com / HDFGroup/hdf5
The HDF Group: Hierarchical Data Format, version 5,https : / / github . com / HDFGroup/hdf5
-
[25]
In: ECCV (2024)
Van Hoorick, B., Wu, R., Ozguroglu, E., Sargent, K., Liu, R., Tokmakov, P., Dave, A., Zheng, C., V ondrick, C.: Generative camera dolly: Extreme monocular dynamic novel view synthe- sis. In: ECCV (2024)
2024
-
[26]
International Conference on Learning Representations (ICLR) (2024)
Vedder, K., Peri, N., Chodosh, N., Khatri, I., Eaton, E., Jayaraman, D., Ramanan, Y .L.D., Hays, J.: ZeroFlow: Fast Zero Label Scene Flow via Distillation. International Conference on Learning Representations (ICLR) (2024)
2024
-
[27]
Vedder, K., Peri, N., Khatri, I., Li, S., Eaton, E., Kocamaz, M.K., Wang, Y ., Yu, Z., Ramanan, D., Pehserl, J.: Neural eulerian scene flow fields (2025),https://openreview.net/ forum?id=0CieWy9ONY
2025
-
[28]
In: IEEE/CVF Conference on Computer Vision and Pattern Recognition
Wang, J., Chen, M., Karaev, N., Vedaldi, A., Rupprecht, C., Novotný, D.: VGGT: visual geometry grounded transformer. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 5294–5306 (2025)
2025
-
[29]
arXiv preprint arXiv:2311.15615 (2023)
Wang, Z., Chen, F., Lertniphonphan, K., Chen, S., Bao, J., Zheng, P., Zhang, J., Huang, K., Zhang, T.: Technical report for argoverse challenges on unified sensor-based detection, tracking, and forecasting. arXiv preprint arXiv:2311.15615 (2023)
-
[30]
Wilson, B., Qi, W., Agarwal, T., Lambert, J., Singh, J., et al.: Argoverse 2: Next genera- tion datasets for self-driving perception and forecasting. In: Proceedings of the Neural In- SynFlow: Scaling Up LiDAR Scene Flow Estimation with Synthetic Data 17 formation Processing Systems Track on Datasets and Benchmarks (NeurIPS Datasets and Benchmarks 2021) (2021)
2021
-
[31]
Advances in Neural Information Processing Systems37, 53285–53316 (2024)
Xie, D., Bi, S., Shu, Z., Zhang, K., Xu, Z., Zhou, Y ., Pirk, S., Kaufman, A., Sun, X., Tan, H.: Lrm-zero: Training large reconstruction models with synthesized data. Advances in Neural Information Processing Systems37, 53285–53316 (2024)
2024
-
[32]
Zhao, Chenfeng Xu, Chen Tang, Chenran Li, Mingyu Ding, Masayoshi Tomizuka, and Wei Zhan
Yang, D., Cai, X., Liu, Z., Jiang, W., Zhang, B., Yan, G., Gao, X., Liu, S., Shi, B.: Re- alistic rainy weather simulation for lidars in carla simulator. In: 2024 IEEE/RSJ Interna- tional Conference on Intelligent Robots and Systems (IROS). pp. 951–957 (2024).https: //doi.org/10.1109/IROS58592.2024.10802036
-
[33]
In: The Twelfth International Confer- ence on Learning Representations (2024),https://openreview.net/forum?id= 1d2cLKeNgY
Zhang, B., Cai, X., Yuan, J., Yang, D., Guo, J., Yan, X., Xia, R., Shi, B., Dou, M., Chen, T., Liu, S., Yan, J., Qiao, Y .: ResimAD: Zero-shot 3d domain transfer for autonomous driv- ing with source reconstruction and target simulation. In: The Twelfth International Confer- ence on Learning Representations (2024),https://openreview.net/forum?id= 1d2cLKeNgY
2024
-
[34]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (2026)
Zhang, Q., Jiang, C., Zhu, X., Miao, Y ., Zhang, Y ., Andersson, O., Jensfelt, P.: TeFlow: Enabling multi-frame supervision for self-supervised feed-forward scene flow estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (2026)
2026
-
[35]
IEEE Transactions on Robotics41, 5896–5911 (2025).https://doi.org/10.1109/TRO.2025.3619042
Zhang, Q., Khoche, A., Yang, Y ., Ling, L., Mansouri, S.S., Andersson, O., Jensfelt, P.: HiMo: High-speed objects motion compensation in point cloud. IEEE Transactions on Robotics41, 5896–5911 (2025).https://doi.org/10.1109/TRO.2025.3619042
-
[36]
Zhang, Q., Yang, Y ., Fang, H., Geng, R., Jensfelt, P.: DeFlow: Decoder of scene flow network in autonomous driving. In: 2024 IEEE International Conference on Robotics and Automation (ICRA). pp. 2105–2111 (2024).https://doi.org/10.1109/ICRA57147.2024. 10610278
-
[37]
In: European Conference on Computer Vision (ECCV)
Zhang, Q., Yang, Y ., Li, P., Andersson, O., Jensfelt, P.: SeFlow: A self-supervised scene flow method in autonomous driving. In: European Conference on Computer Vision (ECCV). p. 353–369. Springer (2024).https://doi.org/10.1007/978-3-031-73232- 4_20
-
[38]
In: The Thirty-ninth Annual Conference on Neu- ral Information Processing Systems (2025)
Zhang, Q., Zhu, X., Zhang, Y ., Cai, Y ., Andersson, O., Jensfelt, P.: DeltaFlow: An efficient multi-frame scene flow estimation method. In: The Thirty-ninth Annual Conference on Neu- ral Information Processing Systems (2025)
2025
-
[39]
Advances in Neural Information Processing Systems36(2024)
Zhang, Y ., Edstedt, J., Wandt, B., Forssén, P.E., Magnusson, M., Felsberg, M.: GMSF: Global matching scene flow. Advances in Neural Information Processing Systems36(2024)
2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.