CAMASA: A CAM-based Dataset from the MASA Living Lab

Angelo Porrello; Antonio Solida; Carlo Augusto Grazia; Gaetano Orazio Cauchi; Marco Savarese; Martin Klapez; Maurizio Casoni; Salvatore Iandolo

arxiv: 2606.10641 · v1 · pith:HTKXU7LLnew · submitted 2026-06-09 · 💻 cs.NI

CAMASA: A CAM-based Dataset from the MASA Living Lab

Salvatore Iandolo , Marco Savarese , Gaetano Orazio Cauchi , Antonio Solida , Martin Klapez , Maurizio Casoni , Angelo Porrello , Carlo Augusto Grazia This is my paper

Pith reviewed 2026-06-27 11:38 UTC · model grok-4.3

classification 💻 cs.NI

keywords CAMASAdatasetC-ITStrajectory predictionV2XCAMDENMurban mobility

0 comments

The pith

CAMASA supplies over 14,000 km of real urban vehicle trajectories reconstructed from millions of V2X messages.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces CAMASA as a large infrastructure-based dataset built from Cooperative Awareness Messages and Decentralized Environmental Notification Messages collected in the Modena Automotive Smart Area. It applies filtering, pseudonym reconciliation for station ID changes, and normalization to 10 Hz to produce usable trajectories under authentic traffic conditions. This addresses the shortage of real-world V2X data for trajectory prediction and related C-ITS tasks that synthetic traces or limited sensor datasets cannot fully replace. The resulting collection of more than 40 million CAMs and 2 million DENMs spans multiple months and tens of thousands of unique stations.

Core claim

CAMASA provides a statistically significant empirical foundation for Cooperative Intelligent Transportation Systems research by delivering over 14,000 km of reconstructed vehicle paths from authentic urban CAM and DENM collections, after a preprocessing pipeline that accounts for privacy-driven identifier changes and produces 10 Hz time-series suitable for motion forecasting and simulator calibration.

What carries the argument

The preprocessing pipeline that filters raw messages, reconciles pseudonyms across ETSI stationID changes, and normalizes trajectories to 10 Hz.

If this is right

Trajectory prediction models for autonomous and cooperative driving can be trained and evaluated on real V2X dynamics rather than synthetic traces.
Microscopic traffic simulators such as SUMO can be calibrated directly against observed urban mobility patterns and communication coverage.
ITS Digital Twins can jointly model vehicle movement and V2X message propagation using data from an actual deployment.
Time-series analysis techniques can be applied to study privacy-induced identifier changes and their impact on trajectory continuity.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The dataset could support cross-city comparisons if similar CAM/DENM collections are gathered in other living labs with comparable infrastructure.
Models trained on CAMASA trajectories might reveal how real communication range and packet loss affect cooperative perception beyond what simulators assume.
The pseudonym reconciliation step offers a concrete case study for evaluating privacy mechanisms in live V2X networks.

Load-bearing premise

The described filtering, pseudonym reconciliation, and temporal normalization steps produce accurate trajectories without substantial reconstruction errors or bias introduced by the original CAM and DENM collection process.

What would settle it

A side-by-side comparison of a sample of reconstructed 10 Hz paths against independent high-accuracy GPS traces from the same vehicles would show whether position or speed errors exceed thresholds typical for motion forecasting benchmarks.

Figures

Figures reproduced from arXiv: 2606.10641 by Angelo Porrello, Antonio Solida, Carlo Augusto Grazia, Gaetano Orazio Cauchi, Marco Savarese, Martin Klapez, Maurizio Casoni, Salvatore Iandolo.

**Figure 1.** Figure 1: MASA: real living-lab network. sion avoidance and thereby substantially improve safety. This research area has seen numerous advances in recent years. Among them, ForecastMAE [1] is a notable contribution that employs masked autoencoders to learn spatiotemporal representations of agent trajectories, achieving state-of-the-art performance in multi-modal prediction scenarios. Similarly, the work most closely… view at source ↗

**Figure 2.** Figure 2: CAM density map MASA II. RELATED WORKS To the best of our knowledge, no publicly available dataset matches the scale, temporal continuity, or message density of the dataset presented in this work. Additionally, no publicly available CAM dataset provides a pseudonym-reconciled trajectory-level release. Existing contributions typically focus on smaller-scale deployments, shorter acquisition windows, synthet… view at source ↗

**Figure 3.** Figure 3: Vehicle trajectory via multiple RSUs. traffic jam increasing, and generic hazardous situation), and the number of unique station IDs associated with DENM generation. This complementary breakdown highlights the spatial concentration and typological distribution of safetyrelated events within the monitored urban area. The spatial characteristics of the dataset are shaped by the urban road topology and the R… view at source ↗

**Figure 5.** Figure 5: 10Hz interpolation process. at 50Km/h within the same time interval) the two messages are attributed to the same vehicle. The value of 50km/h was selected as it represents the maximum speed limit typically enforced in urban centers. The temporal threshold of 1.5s was determined empirically: alternative values within the same range were tested, and 1.5s yielded the best reconciliation accuracy in terms of t… view at source ↗

**Figure 6.** Figure 6: Dataset traces’ statistics. the processed dataset are illustrated in [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗

read the original abstract

Trajectory prediction is a key enabler of autonomous and cooperative driving systems. However, most existing benchmarks are either sensor-centric, geographically constrained, or based on synthetic mobility traces that do not capture real-world V2X communication dynamics. This paper introduces CAMASA, a large-scale infrastructure-based dataset derived from Cooperative Awareness Messages (CAMs) and Decentralized Environmental Notification Messages (DENMs) collected within the Modena Automotive Smart Area (MASA). The dataset comprises more than 40 million CAMs and 2 million DENMs recorded under authentic urban traffic conditions over multiple months. We present a rigorous preprocessing pipeline that includes filtering, pseudonym reconciliation to account for ETSI privacy-driven stationID changes, and temporal normalization to 10 Hz trajectories, suitable for motion forecasting and time-series analysis. With over 14,000 km of reconstructed vehicle paths and tens of thousands of unique station IDs, CAMASA provides a statistically significant empirical foundation for research on Cooperative Intelligent Transportation Systems (C-ITS). Beyond trajectory prediction, the dataset enables calibration of microscopic urban traffic simulators (e.g., SUMO) and supports the development of realistic Intelligent Transportation Systems (ITS) Digital Twins by jointly modeling mobility patterns and V2X communication coverage in real deployments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CAMASA is a new real-world V2X dataset from Modena at good scale, but the trajectory reconstructions lack any reported validation or error checks.

read the letter

Dear colleague,

The main takeaway is that this paper releases CAMASA, a dataset of over 40 million CAMs and 2 million DENMs collected from the operational MASA living lab in Modena. After filtering, reconciling station IDs for ETSI privacy changes, and normalizing to 10 Hz, it yields more than 14,000 km of vehicle paths from real urban traffic. That collection and basic cleaning pipeline is the actual new element.

The paper does a solid job on the practical side. Describing how they handle real ETSI messages, including the pseudonym reconciliation step, is useful because that issue trips up a lot of people trying to work with live V2X data. Turning the raw messages into trajectories suitable for motion forecasting and simulator calibration like SUMO is a reasonable goal, and the scale from an actual deployed system beats most synthetic traces.

The soft spot is the missing validation. The abstract lists the preprocessing steps but supplies no error metrics, message-loss statistics, consistency checks across overlapping messages, or comparison to ground truth. Without those, it is not possible to judge how much bias or reconstruction error comes from variable CAM rates, filtering thresholds, or the ID changes. The claim of a statistically significant empirical foundation for C-ITS research therefore rests on an assumption that the paths are accurate, and that assumption is not tested in the provided text.

This is aimed at researchers in trajectory prediction, C-ITS modeling, or traffic simulation who need real V2X traces rather than simulated ones. A reader who wants authentic urban conditions with communication data would find the raw scale helpful, provided they can access the data and run their own checks.

I would send it to peer review. Dataset papers need referees to verify data availability and ask for the missing validation numbers, but the core contribution is clear enough to warrant that step.

Referee Report

1 major / 0 minor

Summary. The paper introduces CAMASA, a large-scale dataset of over 40 million CAMs and 2 million DENMs collected from the MASA living lab under real urban conditions. It describes a preprocessing pipeline (filtering, ETSI stationID pseudonym reconciliation, and 10 Hz temporal normalization) that yields more than 14,000 km of reconstructed vehicle trajectories, positioning the resource as an empirical foundation for C-ITS research, trajectory prediction, and ITS digital twin development.

Significance. If the reconstructed trajectories prove accurate, the dataset would supply a rare real-world V2X trace at scale, enabling better calibration of simulators like SUMO and more realistic modeling of mobility-communication interactions than synthetic or sensor-only benchmarks.

major comments (1)

[Abstract] Abstract: The headline claim of a 'statistically significant empirical foundation' for C-ITS research rests on the fidelity of the >14,000 km of reconstructed paths. The described pipeline (filtering, pseudonym reconciliation, 10 Hz normalization) is presented without any accompanying validation: no position/velocity error distributions, no message-loss statistics, no consistency checks across overlapping CAMs, and no ground-truth comparisons. This absence directly weakens the central assertion that the trajectories faithfully represent real motion.

Simulated Author's Rebuttal

1 responses · 1 unresolved

We thank the referee for the constructive feedback and for recognizing the potential value of CAMASA for C-ITS research. We address the concern regarding validation of the reconstructed trajectories below.

read point-by-point responses

Referee: [Abstract] Abstract: The headline claim of a 'statistically significant empirical foundation' for C-ITS research rests on the fidelity of the >14,000 km of reconstructed paths. The described pipeline (filtering, pseudonym reconciliation, 10 Hz normalization) is presented without any accompanying validation: no position/velocity error distributions, no message-loss statistics, no consistency checks across overlapping CAMs, and no ground-truth comparisons. This absence directly weakens the central assertion that the trajectories faithfully represent real motion.

Authors: We agree that explicit validation strengthens the central claim. The current manuscript focuses on dataset collection and the preprocessing pipeline rather than quantitative fidelity assessment. Direct ground-truth comparisons are not feasible, as the data originates from an operational real-world V2X deployment without co-located high-precision reference sensors. However, we will revise the manuscript to include (i) message-loss statistics computed from the raw CAM stream, (ii) intra-vehicle consistency checks across temporally overlapping CAMs (e.g., position and velocity deltas), and (iii) aggregate position/velocity error distributions derived from the 10 Hz normalization step. These additions will be placed in a new subsection of the preprocessing pipeline description and will support the fidelity assertion without requiring external ground truth. revision: partial

standing simulated objections not resolved

Direct ground-truth comparisons against independent high-accuracy positioning systems are unavailable for this real-world operational dataset.

Circularity Check

0 steps flagged

No circularity: dataset paper with no derivations or predictions

full rationale

The paper is a descriptive introduction of a collected dataset (CAM/DENM messages from MASA living lab) and its basic preprocessing pipeline. No equations, fitted parameters, predictions, or derivation chains appear in the abstract or described content. The central claim is the existence and scale of the released data after filtering and normalization; this does not reduce to any self-referential input by construction. No self-citation load-bearing steps or ansatz smuggling are present. The contribution is self-contained as raw collection plus standard cleaning steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are required; the paper reports collection and basic cleaning of externally generated V2X messages.

pith-pipeline@v0.9.1-grok · 5774 in / 1024 out tokens · 22065 ms · 2026-06-27T11:38:16.120853+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

11 extracted references · 1 linked inside Pith

[1]

Forecast-MAE: Self-supervised pre- training for motion forecasting with masked autoencoders,

J. Cheng, X. Mei, and M. Liu, “Forecast-MAE: Self-supervised pre- training for motion forecasting with masked autoencoders,”Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

2023
[2]

CAMNet: Leveraging Cooperative Awareness Messages for Vehicle Trajectory Prediction,

M. Grasselli, A. Porrello, and C. A. Grazia, “CAMNet: Leveraging Cooperative Awareness Messages for Vehicle Trajectory Prediction,” in 2026 IEEE 23rd Consumer Communications & Networking Conference (CCNC), 2026, pp. 1–6

2026
[3]

Argoverse 2: Next generation datasets for self-driving perception and forecasting,

B. Wilsonet al., “Argoverse 2: Next generation datasets for self-driving perception and forecasting,”arXiv preprint arXiv:2301.00493, 2023

Pith/arXiv arXiv 2023
[4]

V2AIX: A multi-modal real-world dataset of ETSI ITS V2X messages in public road traffic,

G. Kuepperset al., “V2AIX: A multi-modal real-world dataset of ETSI ITS V2X messages in public road traffic,” in2024 IEEE 27th International Conference on Intelligent Transportation Systems (ITSC). IEEE, 2024, pp. 392–398

2024
[5]

Datasets in vehicular com- munication systems: A review of current trends and future prospects,

B. S. Bari, D. Puthal, and K. Yelamarthi, “Datasets in vehicular com- munication systems: A review of current trends and future prospects,” SN Computer Science, vol. 6, no. 3, 2025

2025
[6]

Dataset: Mobility Patterns of a Coastal Area Using Traffic Classification Radars,

J. Ferreiraet al., “Dataset: Mobility Patterns of a Coastal Area Using Traffic Classification Radars,”Data, vol. 7, no. 7, p. 97, 2022

2022
[7]

The Warrigal Dataset: Multi-Vehicle Trajectories and V2V Communications,

J. Wardet al., “The Warrigal Dataset: Multi-Vehicle Trajectories and V2V Communications,”IEEE Intelligent Transportation Systems Mag- azine, vol. 6, no. 3, pp. 109–117, 2014

2014
[8]

Generation and Analysis of a Large-Scale Urban Vehicular Mobility Dataset,

S. Uppooret al., “Generation and Analysis of a Large-Scale Urban Vehicular Mobility Dataset,”IEEE Transactions on Mobile Computing, vol. 13, no. 5, pp. 1061–1075, 2013

2013
[9]

OPV2V: An Open Benchmark Dataset and Fusion Pipeline for Perception with Vehicle-to-Vehicle Communication,

R. Xuet al., “OPV2V: An Open Benchmark Dataset and Fusion Pipeline for Perception with Vehicle-to-Vehicle Communication,” in 2022 International Conference on Robotics and Automation (ICRA). IEEE, 2022, pp. 2583–2589

2022
[10]

Scalability in Perception for Autonomous Driving: Waymo Open Dataset,

P. Sunet al., “Scalability in Perception for Autonomous Driving: Waymo Open Dataset,” in2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2020, pp. 2443–2451

2020
[11]

ITD: Indian Traffic Dataset for Intelligent Trans- portation Systems,

A. Agarwalet al., “ITD: Indian Traffic Dataset for Intelligent Trans- portation Systems,” in2024 16th International Conference on COMmu- nication Systems & NETworkS (COMSNETS). IEEE, 2024, pp. 842– 850

2024

[1] [1]

Forecast-MAE: Self-supervised pre- training for motion forecasting with masked autoencoders,

J. Cheng, X. Mei, and M. Liu, “Forecast-MAE: Self-supervised pre- training for motion forecasting with masked autoencoders,”Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

2023

[2] [2]

CAMNet: Leveraging Cooperative Awareness Messages for Vehicle Trajectory Prediction,

M. Grasselli, A. Porrello, and C. A. Grazia, “CAMNet: Leveraging Cooperative Awareness Messages for Vehicle Trajectory Prediction,” in 2026 IEEE 23rd Consumer Communications & Networking Conference (CCNC), 2026, pp. 1–6

2026

[3] [3]

Argoverse 2: Next generation datasets for self-driving perception and forecasting,

B. Wilsonet al., “Argoverse 2: Next generation datasets for self-driving perception and forecasting,”arXiv preprint arXiv:2301.00493, 2023

Pith/arXiv arXiv 2023

[4] [4]

V2AIX: A multi-modal real-world dataset of ETSI ITS V2X messages in public road traffic,

G. Kuepperset al., “V2AIX: A multi-modal real-world dataset of ETSI ITS V2X messages in public road traffic,” in2024 IEEE 27th International Conference on Intelligent Transportation Systems (ITSC). IEEE, 2024, pp. 392–398

2024

[5] [5]

Datasets in vehicular com- munication systems: A review of current trends and future prospects,

B. S. Bari, D. Puthal, and K. Yelamarthi, “Datasets in vehicular com- munication systems: A review of current trends and future prospects,” SN Computer Science, vol. 6, no. 3, 2025

2025

[6] [6]

Dataset: Mobility Patterns of a Coastal Area Using Traffic Classification Radars,

J. Ferreiraet al., “Dataset: Mobility Patterns of a Coastal Area Using Traffic Classification Radars,”Data, vol. 7, no. 7, p. 97, 2022

2022

[7] [7]

The Warrigal Dataset: Multi-Vehicle Trajectories and V2V Communications,

J. Wardet al., “The Warrigal Dataset: Multi-Vehicle Trajectories and V2V Communications,”IEEE Intelligent Transportation Systems Mag- azine, vol. 6, no. 3, pp. 109–117, 2014

2014

[8] [8]

Generation and Analysis of a Large-Scale Urban Vehicular Mobility Dataset,

S. Uppooret al., “Generation and Analysis of a Large-Scale Urban Vehicular Mobility Dataset,”IEEE Transactions on Mobile Computing, vol. 13, no. 5, pp. 1061–1075, 2013

2013

[9] [9]

OPV2V: An Open Benchmark Dataset and Fusion Pipeline for Perception with Vehicle-to-Vehicle Communication,

R. Xuet al., “OPV2V: An Open Benchmark Dataset and Fusion Pipeline for Perception with Vehicle-to-Vehicle Communication,” in 2022 International Conference on Robotics and Automation (ICRA). IEEE, 2022, pp. 2583–2589

2022

[10] [10]

Scalability in Perception for Autonomous Driving: Waymo Open Dataset,

P. Sunet al., “Scalability in Perception for Autonomous Driving: Waymo Open Dataset,” in2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2020, pp. 2443–2451

2020

[11] [11]

ITD: Indian Traffic Dataset for Intelligent Trans- portation Systems,

A. Agarwalet al., “ITD: Indian Traffic Dataset for Intelligent Trans- portation Systems,” in2024 16th International Conference on COMmu- nication Systems & NETworkS (COMSNETS). IEEE, 2024, pp. 842– 850

2024