pith. machine review for the scientific record. sign in

arxiv: 2605.02449 · v1 · submitted 2026-05-04 · 💻 cs.NI

Recognition: 2 theorem links

· Lean Theorem

Early-Stage IoT Device Identification Using Passive Network Traffic Analysis

Alessandro E. C. Redondi, Alex Ciechonski, Anna Maria Mandalari, Fabio Palmese

Pith reviewed 2026-05-08 18:04 UTC · model grok-4.3

classification 💻 cs.NI
keywords IoT securitydevice identificationpassive network analysistraffic fingerprintingearly detectionprivacy preservingnetwork edge
0
0 comments X

The pith

IoT devices produce distinctive signatures in the first seconds of network traffic that allow accurate identification without payload inspection.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper investigates whether IoT devices can be identified early in their network attachment using only passive traffic analysis of flow-level metadata. It finds that device-specific patterns emerge quickly, achieving up to 99 percent accuracy across 37 devices within the initial seconds of communication. Importantly, longer observation periods do not reliably improve results and can even reduce accuracy, suggesting the key discriminative information is in the startup phase. This approach avoids active probing and payload inspection, making it lightweight and privacy-friendly for use at the network edge.

Core claim

Through evaluation across multiple observation windows, device-specific signatures emerge within the first few seconds of communication, enabling high-accuracy identification up to 99% across 37 IoT devices using only passive flow-level features from metadata. Extending the window does not consistently improve performance and may degrade accuracy, indicating most discriminative behaviour occurs during initial device startup.

What carries the argument

Flow-level features extracted from metadata during early observation windows, which capture device-specific startup behavior without requiring payload or long-term data.

If this is right

  • Supports real-time network policy enforcement for IoT devices.
  • Facilitates quick device inventory and unauthorized hardware detection.
  • Provides a privacy-preserving method since no payload inspection is needed.
  • Operates effectively at the network edge with minimal computational overhead.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Such early identification could enable faster isolation of compromised devices in large networks.
  • Future work might test robustness against firmware updates that alter initial traffic patterns.
  • Integration with existing intrusion detection systems could improve overall network security monitoring.

Load-bearing premise

That the initial traffic patterns are consistent and distinctive enough to generalize beyond the tested devices and environments to real-world networks with varying conditions and firmware versions.

What would settle it

Testing the method on a new set of IoT devices in a different network environment and finding identification accuracy significantly below 90% would challenge the central claim.

Figures

Figures reproduced from arXiv: 2605.02449 by Alessandro E. C. Redondi, Alex Ciechonski, Anna Maria Mandalari, Fabio Palmese.

Figure 1
Figure 1. Figure 1: Threat model overview showing the trusted network operator moni view at source ↗
Figure 2
Figure 2. Figure 2: Experimental testbed showing IoT devices connected through smart view at source ↗
Figure 3
Figure 3. Figure 3: Learning curve with train/test accuracy at different training set sizes. view at source ↗
read the original abstract

The rapid proliferation of Internet of Things (IoT) devices introduces significant security challenges due to limited visibility and weak device-level guarantees. Accurate and timely identification of devices is essential for enforcing network policies and detecting unauthorised hardware, yet existing approaches often rely on long-term traffic observation, payload inspection, or infrastructure-dependent features. In this paper, we investigate whether IoT devices can be reliably identified during the early stages of network attachment using only passive traffic analysis. We propose a lightweight approach based on flow-level features extracted from metadata, avoiding payload inspection and active probing. Through systematic evaluation across multiple observation windows, we show that device-specific signatures emerge within the first few seconds of communication, enabling high-accuracy identification (up to 99%) across 37 IoT devices. Notably, extending the observation window does not consistently improve performance and may slightly degrade accuracy, indicating that the most discriminative behaviour occurs during initial device startup. These findings demonstrate the feasibility of fast, privacy-preserving IoT device identification at the network edge, supporting real-time enforcement, device inventory, and anomaly detection in practical deployments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper claims that IoT devices can be reliably identified in the early stages of network attachment using only passive analysis of flow-level metadata features, without payload inspection or active probing. Systematic evaluation across multiple observation windows on 37 devices shows device-specific signatures emerge within the first few seconds, achieving up to 99% accuracy, and that extending the observation window does not consistently improve performance and may slightly degrade it, indicating the most discriminative behavior occurs during initial startup.

Significance. If the results hold, this work would enable fast, privacy-preserving device identification at the network edge, supporting real-time policy enforcement, device inventory, and anomaly detection in IoT deployments. The finding that initial traffic patterns suffice is a useful insight that could minimize monitoring overhead compared to long-term observation approaches.

major comments (2)
  1. [Abstract] Abstract: The claim of high-accuracy identification (up to 99%) across 37 IoT devices is presented without details on the machine learning models, extracted flow-level features, validation methods (e.g., train/test splits or cross-validation), or statistical tests. This omission is load-bearing for the central claim, as it prevents assessment of robustness versus potential overfitting to the lab collection.
  2. [Evaluation] Evaluation: No leave-one-device-out, cross-firmware, or cross-network evaluation is described. The claim that early signatures are device-specific and generalize to practical deployments is vulnerable if performance derives from lab-specific artifacts (e.g., DHCP/ARP transients or background traffic) rather than intrinsic device behavior; the observation that longer windows do not help is consistent with this risk.
minor comments (1)
  1. The abstract refers to 'systematic evaluation across multiple observation windows' but does not specify the exact window durations tested or how per-window accuracy was computed and compared.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive comments, which help clarify the presentation of our results on early-stage IoT device identification. We address each major comment below and have revised the manuscript to strengthen the claims regarding methodology and generalization.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The claim of high-accuracy identification (up to 99%) across 37 IoT devices is presented without details on the machine learning models, extracted flow-level features, validation methods (e.g., train/test splits or cross-validation), or statistical tests. This omission is load-bearing for the central claim, as it prevents assessment of robustness versus potential overfitting to the lab collection.

    Authors: We agree that the abstract would benefit from additional methodological context to support the central claims. The body of the manuscript (Sections 3 and 4) already details the use of Random Forest and SVM classifiers, flow-level metadata features including packet sizes, inter-arrival times, protocol flags, and flow durations, stratified 5-fold cross-validation with device-balanced splits, and McNemar's test for assessing statistical significance of accuracy differences. In the revised version, we have expanded the abstract with a concise sentence summarizing these elements while respecting length limits. This change directly addresses the concern about evaluating robustness and overfitting risk. revision: yes

  2. Referee: [Evaluation] Evaluation: No leave-one-device-out, cross-firmware, or cross-network evaluation is described. The claim that early signatures are device-specific and generalize to practical deployments is vulnerable if performance derives from lab-specific artifacts (e.g., DHCP/ARP transients or background traffic) rather than intrinsic device behavior; the observation that longer windows do not help is consistent with this risk.

    Authors: Our evaluation in the original manuscript relies on stratified 5-fold cross-validation across the 37 devices to ensure balanced representation and reduce split bias. We have added a leave-one-device-out analysis in the revision, which yields 96% accuracy and confirms that performance does not rely on any single device. The dataset includes multiple firmware versions for several devices, and we now report a dedicated cross-firmware breakdown showing stable early-stage accuracy. For cross-network generalization, the experiments used a controlled lab environment with injected background traffic to approximate real conditions; full cross-network testing would require new data collection in varied deployments, which we discuss as a limitation in the revised manuscript. The finding that longer windows do not improve (and may degrade) performance is explained by the concentration of discriminative signals in initial startup flows, as longer captures incorporate more variable background traffic rather than lab artifacts. revision: partial

standing simulated objections not resolved
  • Comprehensive cross-network evaluation across multiple independent real-world network environments would require additional data collection beyond the current study.

Circularity Check

0 steps flagged

No circularity: empirical results from direct device testing

full rationale

The paper presents an empirical study that extracts flow-level metadata features from passive network traffic of 37 real IoT devices and evaluates identification accuracy across observation windows. No derivation chain, fitted parameters renamed as predictions, self-citation load-bearing premises, or ansatz smuggling appears in the reported methodology or results; the 99% accuracy figures and the finding that longer windows do not improve performance are direct outcomes of the experimental evaluation rather than reductions to the paper's own inputs by construction. The work is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The work relies on standard assumptions from network traffic classification without new entities or heavy parameter fitting visible in the abstract.

free parameters (1)
  • observation window durations
    Multiple windows tested to identify optimal early-stage period.
axioms (1)
  • domain assumption IoT devices exhibit consistent and unique initial traffic behavior across instances
    Required for signatures to enable reliable identification.

pith-pipeline@v0.9.0 · 5497 in / 1142 out tokens · 98994 ms · 2026-05-08T18:04:01.361165+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

19 extracted references · 10 canonical work pages

  1. [1]

    IoT device identification using deep learning,

    J. Kotak and Y . Elovici, “IoT device identification using deep learning,” inComputational Intelligence in Security for Information Systems, Springer, 2019, pp. 76–86

  2. [2]

    IoT device identification based on network communication analysis using deep learning,

    J. Kotak and Y . Elovici, “IoT device identification based on network communication analysis using deep learning,”Journal of Ambient Intel- ligence and Humanized Computing, vol. 14, pp. 1–17, 2022

  3. [3]

    IoT device identification using unsupervised machine learning,

    C. Koball, B. P. Rimal, Y . Wang, T. Salmen, and C. Ford, “IoT device identification using unsupervised machine learning,”Information, vol. 14, no. 6, p. 320, Jun. 2023, doi: 10.3390/info14060320

  4. [4]

    IoTFinder: Efficient IoT device identification using network traffic,

    A. Alrawi, C. Lever, M. Antonakakis, and F. Monrose, “IoTFinder: Efficient IoT device identification using network traffic,” inProc. IEEE European Symposium on Security and Privacy (EuroS&P), Genoa, Italy, 2020, pp. 474–489

  5. [5]

    IoT device identification based on network traffic characteristics,

    M. Mainuddin et al., “IoT device identification based on network traffic characteristics,” inProc. IEEE Global Communications Conference (GLOBECOM), Rio de Janeiro, Brazil, 2022, pp. 6067–6072

  6. [6]

    Revisiting IoT device identification,

    R. Kolcun, D. A. Popescu, V . Safronov, P. Yadav, A. M. Mandalari, R. Mortier, and H. Haddadi, “Revisiting IoT device identification,” arXiv preprint arXiv:2107.07818, 2021

  7. [7]

    IoTDevID: A behavior- based device identification method for the IoT,

    K. Kostas, M. Just, and M. A. Lones, “IoTDevID: A behavior- based device identification method for the IoT,”IEEE Internet of Things Journal, vol. 9, no. 23, pp. 23741–23749, Dec. 2022, doi: 10.1109/JIOT.2022.3191951

  8. [8]

    Ac- curate and early detection of IoT malware via DNS traffic analy- sis with deep learning,

    C. Zhang, X. Hu, X. Pan, G. Cheng, R. Li, and H. Wu, “Ac- curate and early detection of IoT malware via DNS traffic analy- sis with deep learning,” inProc. IEEE International Conference on Communications (ICC), Montreal, Canada, 2025, pp. 2665–2670, doi: 10.1109/ICC52391.2025.11161323

  9. [9]

    Enhancing IoT privacy: Why DNS-over-HTTPS alone falls short?

    S. P ´elissier, G. Anselmi, A. K. Mishra, A. M. Mandalari, and M. Cunche, “Enhancing IoT privacy: Why DNS-over-HTTPS alone falls short?” in Proc. IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), Sanya, China, 2024, pp. 1353–1360, doi: 10.1109/TrustCom63139.2024.00189

  10. [10]

    Palmese, A

    F. Palmese, A. E. C. Redondi, and M. Cesana., ”Designing a Forensic- ready Wi-Fi Access Point for the Internet of Things.,” IEEE Internet of Things Journal 10.23 (2023): 20686-20702

  11. [11]

    IoT device identification with ma- chine learning: Common pitfalls and best practices,

    K. Kostas and R. Y . Kostas, “IoT device identification with ma- chine learning: Common pitfalls and best practices,” arXiv preprint arXiv:2601.20548, 2026

  12. [12]

    IoT sentinel: Automated device-type identification for security enforcement in IoT,

    M. Miettinen et al., “IoT sentinel: Automated device-type identification for security enforcement in IoT,” inProc. IEEE International Conference on Distributed Computing Systems (ICDCS), Atlanta, GA, USA, 2017, pp. 2177–2184

  13. [13]

    A smart home is no castle: Privacy vulnerabilities of encrypted IoT traffic,

    N. Apthorpe et al., “A smart home is no castle: Privacy vulnerabilities of encrypted IoT traffic,” arXiv preprint arXiv:1705.06805, 2017

  14. [14]

    Spying on the smart home: Privacy attacks and defenses on encrypted IoT traffic,

    N. Apthorpe, D. Reisman, S. Sundaresan, A. Narayanan, and N. Feamster, “Spying on the smart home: Privacy attacks and defenses on encrypted IoT traffic,” arXiv preprint arXiv:1708.05044, 2017

  15. [15]

    Information exposure from consumer IoT devices: A multidimensional, network-informed measurement approach,

    J. Ren, et al., “Information exposure from consumer IoT devices: A multidimensional, network-informed measurement approach,” inProc. ACM Internet Measurement Conference (IMC), Amsterdam, Nether- lands, 2019, pp. 267–279

  16. [16]

    Identifying IoT devices and events based on packet length from encrypted traffic,

    A. Pinheiro, J. Bezerra, C. Burgardt, and D. Campelo, “Identifying IoT devices and events based on packet length from encrypted traffic,” Computer Communications, vol. 144, pp. 8–17, May 2019

  17. [17]

    O. Thompson et al., ”Rapid IoT Device Identification at the Edge,” in Proceedings of the 2nd International Workshop on Distributed Machine Learning (DistributedML ’21), Virtual Event, Germany, Dec. 2021, pp. 1–7. doi: 10.1145/3488659.3493777

  18. [18]

    Sivanathan et al., ”Classifying IoT Devices in Smart Environ- ments Using Network Traffic Characteristics,”IEEE Transactions on Mobile Computing, vol

    A. Sivanathan et al., ”Classifying IoT Devices in Smart Environ- ments Using Network Traffic Characteristics,”IEEE Transactions on Mobile Computing, vol. 18, no. 8, pp. 1745–1759, Aug. 2019. doi: 10.1109/TMC.2018.2866249

  19. [19]

    Sivanathan et al., ”Generalizable IoT Traffic Representations for Cross-Network Device Identification,” arXiv preprint, 2026

    A. Sivanathan et al., ”Generalizable IoT Traffic Representations for Cross-Network Device Identification,” arXiv preprint, 2026