M²FedAQI: Multimodal Federated Learning for Air Quality Prediction on Heterogeneous Edge Devices
Pith reviewed 2026-05-20 21:56 UTC · model grok-4.3
The pith
M²FedAQI fuses visual and tabular data via feature modulation in a federated setup to predict air quality more accurately on heterogeneous edge devices.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
M²FedAQI integrates visual and tabular modalities through a feature modulation based fusion mechanism that enables efficient cross-modal interaction while maintaining low computational overhead; when deployed in a federated learning setting on heterogeneous edge devices it consistently outperforms existing approaches, achieving improvements of up to 11.0% in Accuracy, 3.53% in AUC, 12.2% in F1-score, and 18.0% in R², while reducing MAE and RMSE by up to 25.4% and 20.4% on the PM25Vision and TRAQID datasets.
What carries the argument
The feature modulation based fusion mechanism that integrates visual and tabular modalities for efficient cross-modal interaction at low computational cost on edge devices.
If this is right
- Air quality predictions become more accurate for public health alerts and environmental monitoring without moving raw data off devices.
- Communication and memory costs stay low enough for practical use on varied edge hardware.
- Both classification of AQI levels and regression of continuous values improve under the same framework.
- TLS authentication secures the federated channel without altering the underlying learning protocol.
Where Pith is reading between the lines
- The same fusion pattern could be tested on other multimodal environmental tasks such as combining satellite imagery with ground-sensor readings for flood or wildfire risk.
- Extending the approach to streaming video plus real-time pollutant readings might support continuous on-device monitoring.
- Measuring energy draw on a wider range of low-power microcontrollers would clarify deployment limits in battery-constrained IoT networks.
Load-bearing premise
The feature modulation based fusion mechanism enables efficient cross-modal interaction while maintaining low computational overhead on heterogeneous edge devices.
What would settle it
Reproducing the experiments on the PM25Vision dataset and observing no improvement of at least 11 percent in accuracy or 25 percent reduction in MAE over the strongest baseline would falsify the central performance claim.
Figures
read the original abstract
Accurate air quality prediction is essential for public health, environmental monitoring, and industrial safety. However, most existing approaches rely on centralized learning paradigms, which introduce challenges related to scalability, privacy preservation, and communication overhead in distributed Internet of Things (IoT) environments. Moreover, current federated learning (FL) based solutions predominantly utilize unimodal data, limiting their capability to capture complex environmental patterns. To address these limitations, we propose M$^2$FedAQI, a lightweight multimodal federated framework for decentralized Air Quality Index (AQI) prediction across heterogeneous edge devices. The proposed framework integrates visual and tabular modalities through a feature modulation based fusion mechanism that enables efficient cross-modal interaction while maintaining low computational overhead. M$^2$FedAQI is evaluated on two benchmark datasets, PM25Vision and TRAQID, for both classification and regression tasks under centralized and federated settings. Experimental results demonstrate that M$^2$FedAQI consistently outperforms existing approaches, achieving improvements of up to 11.0\% in Accuracy, 3.53\% in AUC, 12.2\% in F1-score, and 18.0\% in $R^2$, while reducing MAE and RMSE by up to 25.4\% and 20.4\%, respectively, compared with the strongest baselines. Furthermore, deployment on heterogeneous edge devices demonstrates efficient resource utilization in terms of communication overhead, memory footprint, and computational cost. To enhance communication security, TLS-based authentication is incorporated to ensure secure client participation and protect the FL communication channel from unauthorized third-party access without modifying the underlying FL protocol.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes M²FedAQI, a lightweight multimodal federated learning framework for decentralized air quality index (AQI) prediction on heterogeneous edge devices. It integrates visual and tabular modalities via a feature modulation based fusion mechanism, evaluates the approach on the PM25Vision and TRAQID datasets for both classification and regression tasks under centralized and federated settings, reports quantitative improvements over baselines (up to 11.0% Accuracy, 3.53% AUC, 12.2% F1, 18.0% R², and reductions of 25.4% MAE and 20.4% RMSE), demonstrates efficient resource utilization on edge hardware, and incorporates TLS-based authentication for secure FL communication.
Significance. If the experimental claims hold under rigorous validation, the work would advance privacy-preserving multimodal learning for IoT-based environmental monitoring by showing how cross-modal fusion can be achieved with low overhead on heterogeneous devices; the combination of federated training, edge deployment metrics, and security enhancements addresses practical deployment barriers in real-world sensing applications.
major comments (2)
- [§4 (Experimental Evaluation)] §4 (Experimental Evaluation) and associated tables: the headline federated outperformance claim (up to 11.0% Accuracy, 18.0% R², 25.4% MAE reduction) is load-bearing for the paper's central contribution, yet the manuscript supplies no concrete numbers for client count, non-IID partitioning method (e.g., Dirichlet α), per-client model capacity or memory constraints, or measured FLOPs/memory on target hardware. This leaves open the possibility that reported gains arise from an overly optimistic centralized-like simulation rather than genuine robustness to the stated heterogeneous edge conditions.
- [Results tables and §4.3] Results tables and §4.3: quantitative improvements are presented without statistical significance tests, standard deviations across multiple runs, or explicit details on baseline re-implementations and data splits. These omissions prevent verification that the gains are reliable and not the result of post-hoc selection or implementation differences.
minor comments (2)
- [Abstract] Abstract: the phrase 'up to' for each metric does not clarify whether the maxima occur on the same dataset/task or across different comparisons; a brief parenthetical note would improve clarity.
- [§3.2 (Fusion Mechanism)] §3.2 (Fusion Mechanism): the description of feature modulation would benefit from a small diagram or pseudocode to illustrate the cross-modal interaction at the level of individual feature maps.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive feedback. The comments highlight important aspects of experimental rigor that we address below. We have revised the manuscript to incorporate additional details and analyses where possible.
read point-by-point responses
-
Referee: [§4 (Experimental Evaluation)] §4 (Experimental Evaluation) and associated tables: the headline federated outperformance claim (up to 11.0% Accuracy, 18.0% R², 25.4% MAE reduction) is load-bearing for the paper's central contribution, yet the manuscript supplies no concrete numbers for client count, non-IID partitioning method (e.g., Dirichlet α), per-client model capacity or memory constraints, or measured FLOPs/memory on target hardware. This leaves open the possibility that reported gains arise from an overly optimistic centralized-like simulation rather than genuine robustness to the stated heterogeneous edge conditions.
Authors: We agree that greater specificity on the experimental configuration is necessary to substantiate the claims under heterogeneous conditions. The original manuscript reported aggregate resource utilization on edge devices but did not enumerate the precise parameters. In the revised manuscript we have expanded §4.1 with a new table (Table 1) that specifies: client counts of 20 for PM25Vision and 50 for TRAQID; non-IID partitioning via Dirichlet distribution with α = 0.5; per-client model memory footprints constrained to ≤ 8 MB; and profiled FLOPs (≈ 1.8 M per forward pass) together with peak RAM usage (≤ 120 MB) measured on Raspberry Pi 4 and Jetson Nano boards. These additions demonstrate that the reported gains were obtained under the targeted federated heterogeneous regime rather than a centralized simulation. We have also added a brief description of the hardware profiling methodology. revision: yes
-
Referee: [Results tables and §4.3] Results tables and §4.3: quantitative improvements are presented without statistical significance tests, standard deviations across multiple runs, or explicit details on baseline re-implementations and data splits. These omissions prevent verification that the gains are reliable and not the result of post-hoc selection or implementation differences.
Authors: We concur that statistical validation and implementation transparency strengthen the results. In the revised version we have updated all result tables to report mean ± standard deviation over five independent runs with distinct random seeds. We have also added Wilcoxon signed-rank tests (with p-values) comparing M²FedAQI against each baseline, confirming statistical significance (p < 0.05) for the headline improvements. Section 4.3 now includes explicit statements on baseline re-implementations (official repositories where available; otherwise faithful re-coding from the original papers) and the data partitioning protocol (per-client 70/30 train/test split with a globally held-out test set). These changes are reflected in the updated tables and text. revision: yes
Circularity Check
No circularity detected in empirical framework evaluation
full rationale
The paper presents an empirical proposal for a multimodal federated learning framework evaluated on benchmark datasets for classification and regression tasks. Performance metrics are reported as direct experimental outcomes under centralized and federated settings without any derivation chain, equations, or first-principles predictions that reduce to fitted inputs or self-citations by construction. Claims of outperformance and resource efficiency rest on implementation results rather than self-referential definitions, making the work self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
feature modulation based fusion mechanism... Zmod = γ ⊙ Zimg + β
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Dirichlet partitioning strategy with α=0.5... heterogeneous edge devices
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
A review of the health impacts of air pollutants.Authorea Preprints, 2023
Jay Patel and William Song. A review of the health impacts of air pollutants.Authorea Preprints, 2023
work page 2023
-
[2]
Global air pollution exposure and poverty.Nature communications, 14(1):4432, 2023
Jun Rentschler and Nadezda Leonova. Global air pollution exposure and poverty.Nature communications, 14(1):4432, 2023
work page 2023
-
[3]
Seth A Horn and Purnendu K Dasgupta. The air quality index (aqi) in historical and analytical perspective a tutorial review.Talanta, 267:125260, 2024
work page 2024
-
[4]
Huili Zhang, Binghao Li, Mahmoud Karimi, Serkan Say- dam, and Mahbub Hassan. Recent advancements in iot implementation for environmental, safety, and production monitoring in underground mines.IEEE internet of things journal, 10(16):14507–14526, 2023
work page 2023
-
[5]
Antony Garcia, Yessica Saez, Itamar Harris, Xinming Huang, and Edwin Collado. Advancements in air quality monitoring: a systematic review of iot-based air quality monitoring and ai technologies.Artificial Intelligence Re- view, 58(9):275, 2025
work page 2025
-
[6]
Amisha Gangwar, Sudhakar Singh, Richa Mishra, and Shiv Prakash. The state-of-the-art in air pollution monitoring and forecasting systems using iot, big data, and machine learning.Wireless Personal Communications, 130(3):1699– 1729, 2023
work page 2023
-
[7]
Yang Han. Pm25vision: A large-scale benchmark dataset for visual estimation of air quality.arXiv preprint arXiv:2509.16519, 2025. Preprint– M 2FedAQI: MultimodalFederatedLearning forAirQualityPrediction onHeterogeneousEdgeDevices10
-
[8]
Traqid-traffic-related air qual- ity image dataset
Om Rajendra Kathalkar, Nitin Nilesh, Sachin Chaudhari, and Anoop Namboodiri. Traqid-traffic-related air qual- ity image dataset. InProceedings of the Fifteenth Indian Conference on Computer Vision Graphics and Image Pro- cessing, pages 1–10, 2024
work page 2024
-
[9]
Ning Jin, Yongkang Zeng, Ke Yan, and Zhiwei Ji. Multi- variate air quality forecasting with nested long short term memory neural network.IEEE Transactions on Industrial Informatics, 17(12):8514–8522, 2021
work page 2021
-
[10]
Enhancing air quality monitoring: a brief review of federated learning advances
Sara Yarham, Mehran Behjati, Haider AH Alobaidy, An- war PP Abdul Majeed, and Yufan Zheng. Enhancing air quality monitoring: a brief review of federated learning advances. InInternational Conference on Intelligent Man- ufacturing and Robotics, pages 489–501. Springer, 2024
work page 2024
-
[11]
Maryam Vahabi, Hossein Fotouhi, et al. Federated learning at the edge in industrial internet of things: A review.Sus- tainable Computing: Informatics and Systems, 46:101087, 2025
work page 2025
-
[12]
Mohammad Shojafar, Mithun Mukherjee, Vincenzo Piuri, and Jemal Abawajy. Guest editorial: Security and privacy of federated learning solutions for industrial iot applica- tions.IEEE transactions on industrial informatics, 18(5): 3519–3521, 2021
work page 2021
-
[13]
Danya Xu, Yi Liu, Guanghui Wen, Yaochu Jin, Tianyou Chai, and Tao Yang. Defedtl: A decentralized federated transfer learning method for fault diagnosis.IEEE Trans- actions on Industrial Informatics, 21(2):1704–1713, 2024
work page 2024
-
[14]
Iot-based aqi estimation using image process- ing and learning methods
Nitin Nilesh, Ishan Patwardhan, Jayati Narang, and Sachin Chaudhari. Iot-based aqi estimation using image process- ing and learning methods. In2022 IEEE 8th World Forum on Internet of Things (WF-IoT), pages 1–5. IEEE, 2022
work page 2022
-
[15]
Qiang Zhang, Fengchen Fu, and Ran Tian. A deep learning and image-based model for air quality estimation.Science of The Total Environment, 724:138178, 2020
work page 2020
-
[16]
Joyanta Jyoti Mondal, Md Farhadul Islam, Raima Is- lam, Nowsin Kabir Rhidi, Sarfaraz Newaz, Meem Arafat Manab, ABM Alim Al Islam, and Jannatun Noor. Uncov- ering local aggregated air quality index with smartphone captured images leveraging efficient deep convolutional neural network.Scientific reports, 14(1):1627, 2024
work page 2024
-
[17]
Ai-and iot-based hybrid model for air quality prediction in a smart city with network assistance
A Kataria and V Puri. Ai-and iot-based hybrid model for air quality prediction in a smart city with network assistance. iet networks, 11 (6), 221–233, 2022
work page 2022
-
[18]
Nairita Sarkar, Rajan Gupta, Pankaj Kumar Keserwani, and Mahesh Chandra Govil. Air quality index prediction using an effective hybrid deep learning model.Environmental Pollution, 315:120404, 2022
work page 2022
-
[19]
Air pollution prediction with multi-modal data and deep neural networks
Jovan Kalajdjieski, Eftim Zdravevski, Roberto Corizzo, Petre Lameski, Slobodan Kalajdziski, Ivan Miguel Pires, Nuno M Garcia, and Vladimir Trajkovik. Air pollution prediction with multi-modal data and deep neural networks. Remote Sensing, 12(24):4142, 2020
work page 2020
-
[20]
Saad Hameed, Ashadul Islam, Kashif Ahmad, Samir Brahim Belhaouari, Junaid Qadir, and Ala Al-Fuqaha. Deep learning based multimodal urban air quality prediction and traffic analytics.Scientific Reports, 13(1):22181, 2023
work page 2023
-
[21]
Umesh Kumar Lilhore, Sarita Simaiya, Rajesh Kumar Singh, Abdullah M Baqasah, Roobaea Alroobaea, Majed Alsafyani, Afnan Alhazmi, and MD Monish Khan. Ad- vanced air quality prediction using multimodal data and dynamic modeling techniques.Scientific Reports, 15(1): 27867, 2025
work page 2025
-
[22]
Greenedge ai: Sustainable federated learning for smart city air quality prediction
Sweta Dey, Rishi Raina, Sudeepta Mishra, Abhinandan S Prasad, and Ramesh Dharavath. Greenedge ai: Sustainable federated learning for smart city air quality prediction. Journal of Industrial Information Integration, 50:101081, 2026
work page 2026
-
[23]
Yue Hu, Ning Cao, Wangyong Guo, Meng Chen, Yi Rong, and Hao Lu. Feddeep: A federated deep learning network for edge assisted multi-urban pm 2.5 forecasting.Applied Sciences, 14(5):1979, 2024
work page 1979
-
[24]
Ratun Rahman, Falguni Patadia, Jinhui Wang, and Dinh C Nguyen. Multimodal federated learning for air quality estimation with model personalization over aerial-ground networks.IEEE Geoscience and Remote Sensing Letters, 2025
work page 2025
-
[25]
T Arjun, K Prashanth, and R Chandran Lekshmi. Mul- timodal federated learning for privacy-preserving urban carbon footprint prediction: A framework for sustainable smart cities. In2025 International Conference on Power, Instrumentation, Control, and Computing (PICC), pages 1–7. IEEE, 2025
work page 2025
-
[26]
Communication- efficient learning of deep networks from decentralized data
Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. Communication- efficient learning of deep networks from decentralized data. InArtificial intelligence and statistics, pages 1273–1282. Pmlr, 2017
work page 2017
-
[27]
Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, et al. Searching for mo- bilenetv3. InProceedings of the IEEE/CVF international conference on computer vision, pages 1314–1324, 2019
work page 2019
-
[28]
Film: Visual reasoning with a general conditioning layer
Ethan Perez, Florian Strub, Harm De Vries, Vincent Du- moulin, and Aaron Courville. Film: Visual reasoning with a general conditioning layer. InProceedings of the AAAI conference on artificial intelligence, volume 32, 2018
work page 2018
-
[29]
Pytorch: An imperative style, high-performance deep learning library
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019
work page 2019
-
[30]
Flower: A Friendly Federated Learning Research Framework
Daniel J Beutel, Taner Topal, Akhil Mathur, Xinchi Qiu, Javier Fernandez-Marques, Yan Gao, Lorenzo Sani, Hei Li Kwing, Titouan Parcollet, Pedro PB de Gusmão, and Nicholas D Lane. Flower: A friendly federated learn- ing research framework.arXiv preprint arXiv:2007.14390, 2020
work page internal anchor Pith review arXiv 2007
-
[31]
Bayesian nonparametric federated learning of neural networks
Mikhail Yurochkin, Mayank Agarwal, Soumya Ghosh, Kristjan Greenewald, Nghia Hoang, and Yasaman Khaza- eni. Bayesian nonparametric federated learning of neural networks. InInternational conference on machine learn- ing, pages 7252–7261. PMLR, 2019
work page 2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.