pith. sign in

arxiv: 2606.10641 · v1 · pith:HTKXU7LLnew · submitted 2026-06-09 · 💻 cs.NI

CAMASA: A CAM-based Dataset from the MASA Living Lab

Pith reviewed 2026-06-27 11:38 UTC · model grok-4.3

classification 💻 cs.NI
keywords CAMASAdatasetC-ITStrajectory predictionV2XCAMDENMurban mobility
0
0 comments X

The pith

CAMASA supplies over 14,000 km of real urban vehicle trajectories reconstructed from millions of V2X messages.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces CAMASA as a large infrastructure-based dataset built from Cooperative Awareness Messages and Decentralized Environmental Notification Messages collected in the Modena Automotive Smart Area. It applies filtering, pseudonym reconciliation for station ID changes, and normalization to 10 Hz to produce usable trajectories under authentic traffic conditions. This addresses the shortage of real-world V2X data for trajectory prediction and related C-ITS tasks that synthetic traces or limited sensor datasets cannot fully replace. The resulting collection of more than 40 million CAMs and 2 million DENMs spans multiple months and tens of thousands of unique stations.

Core claim

CAMASA provides a statistically significant empirical foundation for Cooperative Intelligent Transportation Systems research by delivering over 14,000 km of reconstructed vehicle paths from authentic urban CAM and DENM collections, after a preprocessing pipeline that accounts for privacy-driven identifier changes and produces 10 Hz time-series suitable for motion forecasting and simulator calibration.

What carries the argument

The preprocessing pipeline that filters raw messages, reconciles pseudonyms across ETSI stationID changes, and normalizes trajectories to 10 Hz.

If this is right

  • Trajectory prediction models for autonomous and cooperative driving can be trained and evaluated on real V2X dynamics rather than synthetic traces.
  • Microscopic traffic simulators such as SUMO can be calibrated directly against observed urban mobility patterns and communication coverage.
  • ITS Digital Twins can jointly model vehicle movement and V2X message propagation using data from an actual deployment.
  • Time-series analysis techniques can be applied to study privacy-induced identifier changes and their impact on trajectory continuity.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The dataset could support cross-city comparisons if similar CAM/DENM collections are gathered in other living labs with comparable infrastructure.
  • Models trained on CAMASA trajectories might reveal how real communication range and packet loss affect cooperative perception beyond what simulators assume.
  • The pseudonym reconciliation step offers a concrete case study for evaluating privacy mechanisms in live V2X networks.

Load-bearing premise

The described filtering, pseudonym reconciliation, and temporal normalization steps produce accurate trajectories without substantial reconstruction errors or bias introduced by the original CAM and DENM collection process.

What would settle it

A side-by-side comparison of a sample of reconstructed 10 Hz paths against independent high-accuracy GPS traces from the same vehicles would show whether position or speed errors exceed thresholds typical for motion forecasting benchmarks.

Figures

Figures reproduced from arXiv: 2606.10641 by Angelo Porrello, Antonio Solida, Carlo Augusto Grazia, Gaetano Orazio Cauchi, Marco Savarese, Martin Klapez, Maurizio Casoni, Salvatore Iandolo.

Figure 1
Figure 1. Figure 1: MASA: real living-lab network. sion avoidance and thereby substantially improve safety. This research area has seen numerous advances in recent years. Among them, ForecastMAE [1] is a notable contribution that employs masked autoencoders to learn spatiotemporal representations of agent trajectories, achieving state-of-the-art performance in multi-modal prediction scenarios. Similarly, the work most closely… view at source ↗
Figure 2
Figure 2. Figure 2: CAM density map MASA II. RELATED WORKS To the best of our knowledge, no publicly available dataset matches the scale, temporal continuity, or message density of the dataset presented in this work. Additionally, no pub￾licly available CAM dataset provides a pseudonym-reconciled trajectory-level release. Existing contributions typically focus on smaller-scale deployments, shorter acquisition windows, synthet… view at source ↗
Figure 3
Figure 3. Figure 3: Vehicle trajectory via multiple RSUs. traffic jam increasing, and generic hazardous situation), and the number of unique station IDs associated with DENM generation. This complementary breakdown highlights the spatial concentration and typological distribution of safety￾related events within the monitored urban area. The spatial characteristics of the dataset are shaped by the urban road topology and the R… view at source ↗
Figure 5
Figure 5. Figure 5: 10Hz interpolation process. at 50Km/h within the same time interval) the two messages are attributed to the same vehicle. The value of 50km/h was selected as it represents the maximum speed limit typically enforced in urban centers. The temporal threshold of 1.5s was determined empirically: alternative values within the same range were tested, and 1.5s yielded the best reconciliation accuracy in terms of t… view at source ↗
Figure 6
Figure 6. Figure 6: Dataset traces’ statistics. the processed dataset are illustrated in [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗
read the original abstract

Trajectory prediction is a key enabler of autonomous and cooperative driving systems. However, most existing benchmarks are either sensor-centric, geographically constrained, or based on synthetic mobility traces that do not capture real-world V2X communication dynamics. This paper introduces CAMASA, a large-scale infrastructure-based dataset derived from Cooperative Awareness Messages (CAMs) and Decentralized Environmental Notification Messages (DENMs) collected within the Modena Automotive Smart Area (MASA). The dataset comprises more than 40 million CAMs and 2 million DENMs recorded under authentic urban traffic conditions over multiple months. We present a rigorous preprocessing pipeline that includes filtering, pseudonym reconciliation to account for ETSI privacy-driven stationID changes, and temporal normalization to 10 Hz trajectories, suitable for motion forecasting and time-series analysis. With over 14,000 km of reconstructed vehicle paths and tens of thousands of unique station IDs, CAMASA provides a statistically significant empirical foundation for research on Cooperative Intelligent Transportation Systems (C-ITS). Beyond trajectory prediction, the dataset enables calibration of microscopic urban traffic simulators (e.g., SUMO) and supports the development of realistic Intelligent Transportation Systems (ITS) Digital Twins by jointly modeling mobility patterns and V2X communication coverage in real deployments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper introduces CAMASA, a large-scale dataset of over 40 million CAMs and 2 million DENMs collected from the MASA living lab under real urban conditions. It describes a preprocessing pipeline (filtering, ETSI stationID pseudonym reconciliation, and 10 Hz temporal normalization) that yields more than 14,000 km of reconstructed vehicle trajectories, positioning the resource as an empirical foundation for C-ITS research, trajectory prediction, and ITS digital twin development.

Significance. If the reconstructed trajectories prove accurate, the dataset would supply a rare real-world V2X trace at scale, enabling better calibration of simulators like SUMO and more realistic modeling of mobility-communication interactions than synthetic or sensor-only benchmarks.

major comments (1)
  1. [Abstract] Abstract: The headline claim of a 'statistically significant empirical foundation' for C-ITS research rests on the fidelity of the >14,000 km of reconstructed paths. The described pipeline (filtering, pseudonym reconciliation, 10 Hz normalization) is presented without any accompanying validation: no position/velocity error distributions, no message-loss statistics, no consistency checks across overlapping CAMs, and no ground-truth comparisons. This absence directly weakens the central assertion that the trajectories faithfully represent real motion.

Simulated Author's Rebuttal

1 responses · 1 unresolved

We thank the referee for the constructive feedback and for recognizing the potential value of CAMASA for C-ITS research. We address the concern regarding validation of the reconstructed trajectories below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The headline claim of a 'statistically significant empirical foundation' for C-ITS research rests on the fidelity of the >14,000 km of reconstructed paths. The described pipeline (filtering, pseudonym reconciliation, 10 Hz normalization) is presented without any accompanying validation: no position/velocity error distributions, no message-loss statistics, no consistency checks across overlapping CAMs, and no ground-truth comparisons. This absence directly weakens the central assertion that the trajectories faithfully represent real motion.

    Authors: We agree that explicit validation strengthens the central claim. The current manuscript focuses on dataset collection and the preprocessing pipeline rather than quantitative fidelity assessment. Direct ground-truth comparisons are not feasible, as the data originates from an operational real-world V2X deployment without co-located high-precision reference sensors. However, we will revise the manuscript to include (i) message-loss statistics computed from the raw CAM stream, (ii) intra-vehicle consistency checks across temporally overlapping CAMs (e.g., position and velocity deltas), and (iii) aggregate position/velocity error distributions derived from the 10 Hz normalization step. These additions will be placed in a new subsection of the preprocessing pipeline description and will support the fidelity assertion without requiring external ground truth. revision: partial

standing simulated objections not resolved
  • Direct ground-truth comparisons against independent high-accuracy positioning systems are unavailable for this real-world operational dataset.

Circularity Check

0 steps flagged

No circularity: dataset paper with no derivations or predictions

full rationale

The paper is a descriptive introduction of a collected dataset (CAM/DENM messages from MASA living lab) and its basic preprocessing pipeline. No equations, fitted parameters, predictions, or derivation chains appear in the abstract or described content. The central claim is the existence and scale of the released data after filtering and normalization; this does not reduce to any self-referential input by construction. No self-citation load-bearing steps or ansatz smuggling are present. The contribution is self-contained as raw collection plus standard cleaning steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are required; the paper reports collection and basic cleaning of externally generated V2X messages.

pith-pipeline@v0.9.1-grok · 5774 in / 1024 out tokens · 22065 ms · 2026-06-27T11:38:16.120853+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

11 extracted references · 1 linked inside Pith

  1. [1]

    Forecast-MAE: Self-supervised pre- training for motion forecasting with masked autoencoders,

    J. Cheng, X. Mei, and M. Liu, “Forecast-MAE: Self-supervised pre- training for motion forecasting with masked autoencoders,”Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

  2. [2]

    CAMNet: Leveraging Cooperative Awareness Messages for Vehicle Trajectory Prediction,

    M. Grasselli, A. Porrello, and C. A. Grazia, “CAMNet: Leveraging Cooperative Awareness Messages for Vehicle Trajectory Prediction,” in 2026 IEEE 23rd Consumer Communications & Networking Conference (CCNC), 2026, pp. 1–6

  3. [3]

    Argoverse 2: Next generation datasets for self-driving perception and forecasting,

    B. Wilsonet al., “Argoverse 2: Next generation datasets for self-driving perception and forecasting,”arXiv preprint arXiv:2301.00493, 2023

  4. [4]

    V2AIX: A multi-modal real-world dataset of ETSI ITS V2X messages in public road traffic,

    G. Kuepperset al., “V2AIX: A multi-modal real-world dataset of ETSI ITS V2X messages in public road traffic,” in2024 IEEE 27th International Conference on Intelligent Transportation Systems (ITSC). IEEE, 2024, pp. 392–398

  5. [5]

    Datasets in vehicular com- munication systems: A review of current trends and future prospects,

    B. S. Bari, D. Puthal, and K. Yelamarthi, “Datasets in vehicular com- munication systems: A review of current trends and future prospects,” SN Computer Science, vol. 6, no. 3, 2025

  6. [6]

    Dataset: Mobility Patterns of a Coastal Area Using Traffic Classification Radars,

    J. Ferreiraet al., “Dataset: Mobility Patterns of a Coastal Area Using Traffic Classification Radars,”Data, vol. 7, no. 7, p. 97, 2022

  7. [7]

    The Warrigal Dataset: Multi-Vehicle Trajectories and V2V Communications,

    J. Wardet al., “The Warrigal Dataset: Multi-Vehicle Trajectories and V2V Communications,”IEEE Intelligent Transportation Systems Mag- azine, vol. 6, no. 3, pp. 109–117, 2014

  8. [8]

    Generation and Analysis of a Large-Scale Urban Vehicular Mobility Dataset,

    S. Uppooret al., “Generation and Analysis of a Large-Scale Urban Vehicular Mobility Dataset,”IEEE Transactions on Mobile Computing, vol. 13, no. 5, pp. 1061–1075, 2013

  9. [9]

    OPV2V: An Open Benchmark Dataset and Fusion Pipeline for Perception with Vehicle-to-Vehicle Communication,

    R. Xuet al., “OPV2V: An Open Benchmark Dataset and Fusion Pipeline for Perception with Vehicle-to-Vehicle Communication,” in 2022 International Conference on Robotics and Automation (ICRA). IEEE, 2022, pp. 2583–2589

  10. [10]

    Scalability in Perception for Autonomous Driving: Waymo Open Dataset,

    P. Sunet al., “Scalability in Perception for Autonomous Driving: Waymo Open Dataset,” in2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2020, pp. 2443–2451

  11. [11]

    ITD: Indian Traffic Dataset for Intelligent Trans- portation Systems,

    A. Agarwalet al., “ITD: Indian Traffic Dataset for Intelligent Trans- portation Systems,” in2024 16th International Conference on COMmu- nication Systems & NETworkS (COMSNETS). IEEE, 2024, pp. 842– 850