arxiv: 2604.21471 · v1 · submitted 2026-04-23 · 💻 cs.RO

Recognition: unknown

Ufil: A Unified Framework for Infrastructure-based Localization

Simon Sch\"afer , Lucas Hegerath , Marius Molz , Massimo Marcon , Bassam Alrifaee

Authors on Pith no claims yet

Pith reviewed 2026-05-09 21:54 UTC · model grok-4.3

classification 💻 cs.RO

keywords infrastructure-based localizationmulti-object trackingsensor fusioncooperative awareness messagesroadside sensorsunified frameworkROS 2CAV testbed

0 comments

The pith

Ufil standardizes object models and tracking interfaces so researchers can fuse vehicle messages, lidar, and road sensors into one pipeline without rebuilding it for each new source.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Ufil as a framework that separates the common parts of infrastructure localization from the sensor-specific parts. It supplies a single object model plus reusable pieces for prediction, detection, association, state update, and track management, each with clear interfaces. This design lets developers plug in different data sources and run the identical pipeline in both simulation and physical testbeds. The authors integrate three heterogeneous inputs and report lane-level accuracy with low latency in scenarios involving hundreds of vehicles. A reader would care because repeated custom stacks have slowed progress on road-user tracking for safety applications, and a shared base could let more groups test improvements faster.

Core claim

Ufil supplies a standardized object model together with reusable multi-object tracking components that expose interfaces for prediction, detection, association, state update, and track management. These pieces allow three different data sources—ETSI ITS-G5 Cooperative Awareness Messages from vehicles, lidar detections from roadside nodes, and measurements from an in-road sensitive surface—to feed a single localization pipeline that executes without modification in the CARLA simulator and in the CPM Lab testbed. In a three-lane highway scenario the fused output reaches mean lateral position RMSEs of 0.31 m in simulation and 0.29 m on the physical platform, with mean absolute orientation error

What carries the argument

The standardized object model and reusable multi-object tracking components with defined interfaces for each processing step.

If this is right

Any single component such as a new association method can be swapped in while the rest of the pipeline stays intact.
The same code base produces comparable accuracy and latency whether run with hundreds of simulated vehicles or dozens in a physical lab.
Fusing the three chosen modalities already yields lateral RMSE below 0.32 m and orientation error near 2.2 degrees.
End-to-end latency from each modality to the fused state estimate stays under 100 ms in both environments.
Open reference implementations and examples make it possible to test new sensors or algorithms without starting from scratch.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same interface design could support additional roadside sensors such as radar or additional ITS message types with little extra work.
If the object model proves general enough, it might serve as a common exchange format for other infrastructure projects beyond this framework.
The scale-independent execution suggests the approach could move from small testbeds to larger intersections or corridors without code changes.
Community contributions to individual tracking steps could accumulate into a shared library of interchangeable components.

Load-bearing premise

That one object model and set of tracking interfaces can represent and combine states from very different sensor types without needing substantial per-application changes.

What would settle it

Add a fourth sensor type, such as camera detections, to the pipeline and measure whether the existing interfaces still accept the data, maintain sub-100 ms median latency, and preserve lane-level accuracy without rewriting core components.

Figures

Figures reproduced from arXiv: 2604.21471 by Bassam Alrifaee, Lucas Hegerath, Marius Molz, Massimo Marcon, Simon Sch\"afer.

**Figure 1.** Figure 1: Object definition in a bird’s eye view. object model, which simplifies algorithm exchange and comparison and underpins the portability demonstrated later in Sections IV and V. B. Prediction Ufil applications implement discrete-time multi-object tracking. For each time step k, the prediction step computes the most probable state of each object at time step k given its state at time step k − 1 and a motion … view at source ↗

**Figure 2.** Figure 2: Runtimes of the four solvers implemented in Ufil. Each solver solved [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Results of the simple simulation crossing example for different [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Testing scenario: a three-lane highway segment with SSL, lidar OSN, [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Summary of state accuracy. For each domain, we plot the error for [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

read the original abstract

Infrastructure-based localization enhances road safety and traffic management by providing state estimates of road users. Development is hindered by fragmented, application-specific stacks that tightly couple perception, tracking, and middleware. We introduce Ufil, a Unified Framework for Infrastructure-Based Localization with a standardized object model and reusable multi-object tracking components. Ufil offers interfaces and reference implementations for prediction, detection, association, state update, and track management, allowing researchers to improve components without reimplementing the pipeline. Ufil is open-source C++/ROS 2 software with documentation and executable examples. We demonstrate Ufil by integrating three heterogeneous data sources into a single localization pipeline combining (i) vehicle onboard units broadcasting ETSI ITS-G5 Cooperative Awareness Messages, (ii) a lidar-based roadside sensor node, and (iii) an in-road sensitive surface layer. The pipeline runs unchanged in the CARLA simulator and a small-scale CAV testbed, demonstrating Ufil's scale-independent execution model. In a three-lane highway scenario with 423 and 355 vehicles in simulation and testbed, respectively, the fused system achieves lane-level lateral accuracy with mean lateral position RMSEs of 0.31 m in CARLA and 0.29 m in the CPM Lab, and mean absolute orientation errors around 2.2{\deg}. Median end-to-end latencies from sensing to fused output remain below 100 ms across all modalities in both environments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Ufil gives a practical open-source framework with reusable tracking components and a common object model that fuses CAM, lidar, and sensitive-surface data into one pipeline running unchanged in CARLA and real hardware.

read the letter

Ufil is a practical open-source framework that standardizes the object model and tracking pipeline for infrastructure-based localization, backed by code that actually runs across sim and hardware without changes. The new part is the set of interfaces and reference implementations for prediction, detection, association, state update, and track management, plus the standardized object model that lets different sensors plug in without rewriting the rest. They show this by fusing CAM, lidar, and sensitive-surface data into one pipeline that stays the same in CARLA and the CPM Lab. The results—0.31 m and 0.29 m mean lateral RMSE, about 2.2° orientation error, and sub-100 ms median latency—support lane-level use, and releasing the C++/ROS 2 code with docs and examples is a real plus for anyone who wants to check or extend it. The main limitation is the lack of baseline comparisons to existing trackers or fusion approaches, which makes it tougher to quantify the advantage over fragmented stacks. Error analysis is also high-level; breaking down contributions from each modality or showing edge cases would strengthen the robustness claims. These are not fatal, though, since the primary goal is the framework itself rather than a new algorithm. This paper is for engineers and researchers working on roadside units or multi-sensor CAV systems who need a starting point that avoids custom middleware each time. Readers looking for plug-and-play components and verified cross-platform execution will find it useful. It deserves serious peer review because the open-source release and dual-environment demonstration give enough substance to evaluate the integration claims. I would recommend sending it to referees.

Referee Report

2 major / 3 minor

Summary. The manuscript introduces Ufil, a unified open-source C++/ROS 2 framework for infrastructure-based localization. It defines a standardized object model together with reusable multi-object tracking components and reference implementations for prediction, detection, association, state update, and track management. The framework is demonstrated by fusing three heterogeneous sources—ETSI ITS-G5 CAM messages, lidar roadside units, and in-road sensitive surfaces—into a single pipeline that runs unchanged in the CARLA simulator and the CPM Lab testbed. In a three-lane highway scenario the fused system reports mean lateral position RMSEs of 0.31 m (CARLA) and 0.29 m (CPM Lab), mean absolute orientation errors of approximately 2.2°, and median end-to-end latencies below 100 ms across modalities.

Significance. If the reusability and cross-platform claims hold, Ufil could reduce redundant engineering effort in infrastructure localization research by supplying a common, modular pipeline that accommodates heterogeneous sensors and executes identically in simulation and hardware. The open-source release with documentation and examples, together with concrete accuracy and latency numbers obtained in two distinct environments, provides a verifiable starting point for further component-level improvements.

major comments (2)

[Evaluation] Evaluation section: the reported RMSE and latency figures are given as single mean values without accompanying standard deviations, number of independent runs, or statistical tests. This weakens the robustness claim for lane-level accuracy and sub-100 ms performance, especially given the modest vehicle counts (423 in simulation, 355 in testbed).
[Framework and Evaluation] Framework description and evaluation: while the paper asserts that the standardized object model and reusable components enable integration “without reimplementing the pipeline,” no quantitative evidence (e.g., lines of code changed or time required) is supplied for adding a fourth sensor modality or porting to a new application. This leaves the weakest assumption untested.

minor comments (3)

[Abstract and Introduction] Abstract and §1: several acronyms (CAM, CPM Lab, ETSI ITS-G5) appear before their first definitions; add a short acronym table or inline expansions on first use.
[Implementation] The manuscript would benefit from a short table comparing the three input modalities (message format, update rate, typical noise characteristics) to clarify how the standardized object model normalizes them.
[Evaluation] Figure captions and axis labels in the results plots should explicitly state the number of vehicles or time windows used for each RMSE and latency statistic.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the positive evaluation and recommendation for minor revision. We address the major comments point by point below.

read point-by-point responses

Referee: [Evaluation] Evaluation section: the reported RMSE and latency figures are given as single mean values without accompanying standard deviations, number of independent runs, or statistical tests. This weakens the robustness claim for lane-level accuracy and sub-100 ms performance, especially given the modest vehicle counts (423 in simulation, 355 in testbed).

Authors: We appreciate this observation. The reported mean RMSE and latency values are computed over all vehicles in the single execution of the three-lane highway scenario in each environment (423 vehicles in CARLA, 355 in CPM Lab). We performed no multiple independent runs or statistical tests, as the evaluation focus was on demonstrating cross-platform operation of the unified pipeline rather than statistical properties of the tracking algorithm. In the revised manuscript we will explicitly note the single-run nature of the experiments, restate the vehicle counts, and qualify the robustness claims to reflect that these results establish feasibility across simulation and hardware. revision: yes
Referee: [Framework and Evaluation] Framework description and evaluation: while the paper asserts that the standardized object model and reusable components enable integration “without reimplementing the pipeline,” no quantitative evidence (e.g., lines of code changed or time required) is supplied for adding a fourth sensor modality or porting to a new application. This leaves the weakest assumption untested.

Authors: We agree that quantitative integration-effort metrics would strengthen the reusability argument. The current evaluation shows successful fusion of three heterogeneous sources (CAM, lidar, sensitive surfaces) using the standardized interfaces without altering the core pipeline. We did not, however, conduct a fourth-modality experiment or record lines-of-code or time measurements. In revision we will expand the discussion of the object model and component interfaces to illustrate extensibility using the existing three-modality case as qualitative support. We cannot supply the requested quantitative figures because such measurements were not collected. revision: partial

standing simulated objections not resolved

Quantitative evidence (lines of code changed or developer time) for adding a fourth sensor modality or porting to a new application, as no such experiment was performed.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper describes an open-source software framework (Ufil) for integrating heterogeneous infrastructure sensors into a unified multi-object tracking pipeline. Its core claims rest on empirical results from CARLA simulation and CPM Lab testbed runs, with reported lateral RMSEs (0.31 m / 0.29 m) and sub-100 ms latencies. No equations, parameter fits, or predictions appear in the provided text; the work supplies reference implementations and an open-source release for independent verification. No self-citations, ansatzes, or renamings reduce the central claims to inputs by construction. This is a standard engineering demonstration paper with no load-bearing circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central contribution is the software framework itself rather than new physical principles or fitted parameters; relies on established assumptions in sensor fusion and middleware.

axioms (1)

domain assumption Standard robotics middleware like ROS 2 can reliably handle real-time data fusion from multiple sensors
The framework is implemented in ROS 2 and assumes its communication and timing capabilities support the localization pipeline.

invented entities (1)

Standardized object model no independent evidence
purpose: To enable reusable multi-object tracking components across different applications
The paper introduces this model as part of the unified framework to decouple perception, tracking, and middleware components.

pith-pipeline@v0.9.0 · 5562 in / 1535 out tokens · 53370 ms · 2026-05-09T21:54:58.513882+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

29 extracted references · 4 canonical work pages

[1]

Autoware on board: Enabling autonomous vehicles with embedded systems,

S. Kato, S. Tokunaga, Y . Maruyama, S. Maeda, M. Hirabayashi, Y . Kitsukawa, A. Monrroy, T. Ando, Y . Fujii, and T. Azumi, “Autoware on board: Enabling autonomous vehicles with embedded systems,” in Proceedings of the 9th ACM/IEEE International Conference on Cyber- Physical Systems (ICCPS), 2018, pp. 287–296

2018
[2]

Edgar: An autonomous driving research platform – from feature development to real-world application,

P. Karle, T. Betz, M. Bosk, F. Fent, N. Gehrke, M. Geisslinger, L. Gressenbuch, P. Hafemann, S. Huber, M. H ¨ubner, S. Huch, G. Kaljavesi, T. Kerbl, D. Kulmer, T. Mascetta, S. Maierhofer, F. Pfab, F. Rezabek, E. Rivera, S. Sagmeister, L. Seidlitz, F. Sauerbeck, I. Tahiraj, R. Trauth, N. Uhlemann, G. W ¨ursching, B. Zarrouki, M. Althoff, J. Betz, K. Bengle...

work page arXiv 2024
[3]

Automated driving toolbox,

T. M. Inc., “Automated driving toolbox,” Natick, Massachusetts, United States, 2025, version: 2025a. [Online]. Available: https: //www.mathworks.com/help/driving/index.html

2025
[4]

A survey on small-scale testbeds for connected and automated vehicles and robot swarms: A guide for creating a new testbed,

A. Mokhtarian, J. Xu, P. Scheffe, M. Kloock, S. Sch ¨afer, H. Bang, V .- A. Le, S. Ulhas, J. Betz, S. Wilson, S. Berman, L. Paull, A. Prorok, and B. Alrifaee, “A survey on small-scale testbeds for connected and automated vehicles and robot swarms: A guide for creating a new testbed,”IEEE Robotics & Automation Magazine, 2024

2024
[5]

From small-scale to full-scale: Assessing the potential for transferability of experimental results in small-scale cav testbeds,

S. Sch ¨afer and B. Alrifaee, “From small-scale to full-scale: Assessing the potential for transferability of experimental results in small-scale cav testbeds,” inIEEE International Conference on Vehicular Electronics and Safety (ICVES), 2024

2024
[6]

Choose your simulator wisely: A review on open-source simulators for autonomous driving,

Y . Li, W. Yuan, S. Zhang, W. Yan, Q. Shen, C. Wang, and M. Yang, “Choose your simulator wisely: A review on open-source simulators for autonomous driving,”IEEE Transactions on Intelligent Vehicles, vol. 9, no. 5, pp. 4861–4876, 2024

2024
[7]

CARLOS: An Open, Modular, and Scalable Simulation Framework for the Development and Testing of Software for C-ITS,

C. Geller, B. Haas, A. Kloeker, J. Hermens, B. Lampe, T. Beemel- manns, and L. Eckstein, “CARLOS: An Open, Modular, and Scalable Simulation Framework for the Development and Testing of Software for C-ITS,” inIEEE Intelligent Vehicles Symposium (IV), 2024, pp. 3100– 3106

2024
[8]

A9-dataset: Multi-sensor infrastructure- based dataset for mobility research,

C. Creß, W. Zimmer, L. Strand, M. Fortkord, S. Dai, V . Lakshmi- narasimhan, and A. Knoll, “A9-dataset: Multi-sensor infrastructure- based dataset for mobility research,” in2022 IEEE Intelligent Vehicles Symposium (IV), 2022, pp. 965–970

2022
[9]

Lumpi: The leibniz university multi-perspective intersection dataset,

S. Busch, C. Koetsier, J. Axmann, and C. Brenner, “Lumpi: The leibniz university multi-perspective intersection dataset,” in2022 IEEE Intelligent Vehicles Symposium (IV). IEEE, 2022, pp. 1127–1134

2022
[10]

8066–8076

K. C. Sekaran, M. Geisler, D. R ¨oßle, A. Mohan, D. Cremers, W. Utschick, M. Botsch, W. Huber, and T. Sch ¨on, “Urbaning-v2x: A large-scale multi-vehicle, multi-infrastructure dataset across multiple intersections for cooperative perception,” 2025. [Online]. Available: https://arxiv.org/abs/2510.23478

work page arXiv 2025
[11]

Collaboration helps camera overtake lidar in 3d detection,

Y . Hu, Y . Lu, R. Xu, W. Xie, S. Chen, and Y . Wang, “Collaboration helps camera overtake lidar in 3d detection,” inThe IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023

2023
[12]

Dair-v2x: A large-scale dataset for vehicle- infrastructure cooperative 3d object detection,

H. Yu, Y . Luo, M. Shu, Y . Huo, Z. Yang, Y . Shi, Z. Guo, H. Li, X. Hu, J. Yuan, and Z. Nie, “Dair-v2x: A large-scale dataset for vehicle- infrastructure cooperative 3d object detection,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 21 361–21 370

2022
[13]

Semi-automatic annotation of 3d radar and camera for smart infrastructure-based perception,

S. Agrawal, S. Bhanderi, and G. Elger, “Semi-automatic annotation of 3d radar and camera for smart infrastructure-based perception,”IEEE Access, vol. 12, pp. 34 325–34 341, 2024

2024
[14]

Coopscenes: Multi-scene infrastructure and vehicle data for advancing collective perception in autonomous driving,

M. V osshans, A. Baumann, M. Drueppel, O. Ait-Aider, Y . Mezouar, T. Dang, and M. Enzweiler, “Coopscenes: Multi-scene infrastructure and vehicle data for advancing collective perception in autonomous driving,” in2025 IEEE Intelligent Vehicles Symposium (IV), 2025, pp. 1040–1047

2025
[15]

2.5d object detection for intelligent roadside infrastructure,

N. Polley, Y . Boualili, F. M ¨utsch, M. Zipfl, T. Fleck, and J. M. Z ¨ollner, “2.5d object detection for intelligent roadside infrastructure,” 2025. [Online]. Available: https://arxiv.org/abs/2507.03564

work page arXiv 2025
[16]

Set-theoretic local- ization for mobile robots with infrastructure-based sensing,

X. Li, Y . Li, N. Li, A. Girard, and I. Kolmanovsky, “Set-theoretic local- ization for mobile robots with infrastructure-based sensing,”Advanced Control for Applications: Engineering and Industrial Systems, vol. 5, no. 1, p. e117, 2023

2023
[17]

An extensible framework for open heterogeneous collaborative perception,

Y . Lu, Y . Hu, Y . Zhong, D. Wang, S. Chen, and Y . Wang, “An extensible framework for open heterogeneous collaborative perception,” inThe Twelfth International Conference on Learning Representations, 2024

2024
[18]

Comparison and evaluation of advanced motion models for vehicle tracking,

R. Schubert, E. Richter, and G. Wanielik, “Comparison and evaluation of advanced motion models for vehicle tracking,” in2008 11th Interna- tional Conference on Information Fusion, 2008, pp. 1–6

2008
[19]

Bar-Shalom, X

Y . Bar-Shalom, X. R. Li, and T. Kirubarajan,Estimation with applica- tions to tracking and navigation: theory algorithms and software. John Wiley & Sons, 2002

2002
[20]

R. S. Bucy and P. D. Joseph,Filtering for stochastic processes with applications to guidance. Chelsea Pub. Co, 1987

1987
[21]

Aeberhard,Object-level fusion for surround environment perception in automated driving applications

M. Aeberhard,Object-level fusion for surround environment perception in automated driving applications. VDI Verlag, 2017

2017
[22]

Lidar-based tracking of traffic participants with sensor nodes in existing urban infrastructure,

S. Sch ¨afer, B. Alrifaee, and E. Hashemi, “Lidar-based tracking of traffic participants with sensor nodes in existing urban infrastructure,”
[23]

Available: https://arxiv.org/abs/2509.20009

[Online]. Available: https://arxiv.org/abs/2509.20009

work page arXiv
[24]

V2aix: A multi- modal real-world dataset of etsi its v2x messages in public road traffic,

G. Kueppers, J.-P. Busch, L. Reiher, and L. Eckstein, “V2aix: A multi- modal real-world dataset of etsi its v2x messages in public road traffic,” inIEEE International Conference on Intelligent Transportation Systems (ITSC), 2024, pp. 392–398

2024
[25]

Investigating a pressure sensitive surface layer for vehicle localization,

S. Sch ¨afer, H. Steidl, S. Kowalewski, and B. Alrifaee, “Investigating a pressure sensitive surface layer for vehicle localization,” inIEEE Intelligent Vehicles Symposium (IV), 2023

2023
[26]

CARLA: An open urban driving simulator,

A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V . Koltun, “CARLA: An open urban driving simulator,” inProceedings of the 1st Annual Conference on Robot Learning, ser. Proceedings of Machine Learning Research, S. Levine, V . Vanhoucke, and K. Goldberg, Eds., vol. 78. PMLR, 11 2017, pp. 1–16

2017
[27]

Cyber-physical mobility lab: An open- source platform for networked and autonomous vehicles,

M. Kloock, P. Scheffe, J. Maczijewski, A. Kampmann, A. Mokhtarian, S. Kowalewski, and B. Alrifaee, “Cyber-physical mobility lab: An open- source platform for networked and autonomous vehicles,” inEUCA European Control Conference (ECC), 2021, pp. 1937–1944

2021
[28]

Networked and autonomous model- scale vehicles for experiments in research and education,

P. Scheffe, J. Maczijewski, M. Kloock, A. Kampmann, A. Derks, S. Kowalewski, and B. Alrifaee, “Networked and autonomous model- scale vehicles for experiments in research and education,”IFAC- PapersOnLine, vol. 53, no. 2, pp. 17 332–17 337, 2020

2020
[29]

Vision-based real-time indoor positioning system for multiple vehicles,

M. Kloock, P. Scheffe, I. T ¨ulleners, J. Maczijewski, S. Kowalewski, and B. Alrifaee, “Vision-based real-time indoor positioning system for multiple vehicles,”IFAC-PapersOnLine, vol. 53, no. 2, pp. 15 446– 15 453, 2020

2020