Semantic Identification of IoT Devices from Behavioral Primitives

Hassan Habibi Gharakheili; Samuel Witt

arxiv: 2606.12793 · v1 · pith:TAQCTVZJnew · submitted 2026-06-11 · 💻 cs.CR · cs.IR

Semantic Identification of IoT Devices from Behavioral Primitives

Samuel Witt , Hassan Habibi Gharakheili This is my paper

Pith reviewed 2026-06-27 06:52 UTC · model grok-4.3

classification 💻 cs.CR cs.IR

keywords IoT device identificationMUD profilesAccess Control Entriessemantic matchingbehavioral primitivesnetwork securitytraffic analysis

0 comments

The pith

Semantic matching of behavioral primitives from MUD profiles identifies IoT devices more reliably than exact matching under variable conditions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tries to establish that semantic representations built from individual Access Control Entries in Manufacturer Usage Description profiles can identify IoT devices even when observed traffic includes unseen entries, drifted hostnames, or only partial data. This matters because low-level packet and flow methods suffer from variations across deployments and user interactions, while MUD profiles offer a more stable description of allowed behavior. By constructing geometric representations from the compact text of these behavioral primitives, ACE-level embeddings preserve device distinctions better than whole-profile embeddings and stay effective after calibration. Tests on 28 public profiles and more than 800,000 real flows show that semantic matching retains useful identification evidence where exact overlap becomes sparse or disappears, especially in early observation stages.

Core claim

Semantic ACE matching preserves useful identification evidence across conditions of unseen ACEs, drifted hostnames, and partial runtime observation, while exact ACE matching degrades sharply when overlap becomes sparse. This holds on 1,023 ACE instances from public profiles and on real traffic traces, with semantic methods frequently keeping the correct device among the highest-ranked candidates.

What carries the argument

Semantic ACE matching using geometric representations derived from compact behavioral text of Access Control Entries in MUD profiles.

If this is right

Device identification remains possible with partial runtime observations where exact matches fail.
Correct devices stay among top-ranked candidates under sparse overlap and drifted hostnames.
Semantic matching supplies stronger evidence than exact methods during early stages of traffic observation.
The representations remain effective after whitening calibration on both public profiles and real traces.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The method could support real-time network policy enforcement by identifying devices from limited initial flows.
Similar semantic techniques on behavioral primitives might extend to anomaly detection or policy violation spotting.
Integration with dynamically updated MUD profiles could handle evolving device software versions.

Load-bearing premise

Compact behavioral text derived from ACEs produces geometric representations that preserve device-level distinctions more effectively than whole-profile embeddings.

What would settle it

A new dataset of IoT devices where semantic ACE matching ranks the correct device lower than exact matching under sparse-overlap conditions would falsify the central claim.

Figures

Figures reproduced from arXiv: 2606.12793 by Hassan Habibi Gharakheili, Samuel Witt.

**Figure 2.** Figure 2: Top-1 accuracy across the 26 device traces as a [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: CDF of the rank assigned to the correct device [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

read the original abstract

Accurate identification of IoT devices is important for security management and policy enforcement. Existing approaches typically learn device signatures from packets or flow records. These methods operate on low-level communication observations whose traffic patterns may vary across deployments, software versions, and user interactions. This paper studies device identification using Manufacturer Usage Description (MUD) profiles. MUD profiles describe device behavior using Access Control Entries (ACEs), where each ACE represents a behavioral primitive consisting of protocol, endpoint, direction, and port semantics derived from device communication policy. Our contributions are threefold. First, using 28 publicly available MUD profiles containing 1,023 ACE instances, we construct ACE-level semantic representations from compact behavioral text and analyze their geometric properties. ACE-level representations preserve device-level behavioral distinctions more effectively than whole-profile embeddings and remain effective after whitening calibration. Second, we evaluate semantic ACE matching under controlled runtime variations, including unseen ACEs, drifted hostnames, and partial runtime observation. Exact ACE matching performs well when the overlap with the canonical MUD profile remains high, but degrades sharply when the overlap becomes sparse or disappears. In contrast, semantic ACE matching preserves useful identification evidence across these conditions. Third, we evaluate the same approach on real IoT traffic traces comprising more than 800,000 observed flows. Exact overlap remains the strongest signal when stable overlap exists, while semantic ACE matching provides stronger identification evidence during the early stages of observation, frequently retains the correct device among the highest-ranked candidates, and remains effective under sparse-overlap runtime traffic.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Semantic per-ACE embeddings keep device ID signal when exact MUD overlap thins or vanishes, and the three-part test on 28 profiles plus 800k flows shows that gap clearly.

read the letter

The main point is that exact ACE overlap works when traffic matches the profile closely but collapses under unseen entries, hostname changes, or early partial observation, while semantic embeddings of the individual ACE text keep the correct device in the top ranks across those cases. They first take the 1023 ACEs from 28 public MUD profiles, turn each into compact behavioral text, embed it, and show the per-ACE vectors separate devices better than whole-profile embeddings even after whitening. That geometric check is the first piece.

The second piece runs controlled variations on the same profiles: hiding ACEs, drifting hostnames, and limiting observation length. Exact matching degrades as overlap drops; semantic matching does not. The third piece applies both methods to real traces with over 800k flows and finds the same pattern—exact is strongest with stable overlap, semantic is stronger early and under sparse conditions. The evaluation sits on external public profiles and real traces, so the numbers are not fitted inside the study.

What is new is the explicit reduction to per-ACE semantic matching and the direct test of the low-overlap regime that exact methods hit. The setup is straightforward and the data scale is reasonable for the claim.

The soft spot is that the abstract gives no detail on the embedding model, the distance function, or any statistical test on the ranking results. If the full paper uses off-the-shelf embeddings and reports proper significance checks, that gap closes; otherwise the central comparison rests on raw numbers without the method specifics. No other load-bearing issues appear in the argument.

This is for people building IoT policy tools or network management systems that already use MUD or similar descriptions. A reader who needs a practical way to handle partial or drifted traffic will get usable evidence. It deserves peer review because the evaluation design directly addresses the claimed advantage and the data are external.

Referee Report

2 major / 1 minor

Summary. The paper claims that semantic embeddings derived from compact behavioral text of Access Control Entries (ACEs) in MUD profiles enable robust IoT device identification. Using 28 public profiles (1023 ACEs), it shows ACE-level representations preserve device distinctions better than whole-profile embeddings (even after whitening). Controlled tests demonstrate semantic ACE matching retains useful evidence under unseen ACEs, hostname drift, and partial observation, while exact matching degrades with sparse overlap. On 800k real flows, semantic matching aids early-stage and sparse-overlap cases, with exact overlap strongest when stable overlap exists.

Significance. If the results hold, this offers a meaningful advance for IoT security by shifting from variable low-level traffic signatures to standardized, policy-derived behavioral primitives that are more resilient to deployment differences. The geometric analysis supplies an explicit rationale for preferring per-ACE text representations, the controlled-variation experiments directly probe the overlap-sparsity regime, and the scale of the real-traffic corpus (800k flows) provides a concrete empirical basis. The use of external public profiles and real traces avoids circularity in the evaluation.

major comments (2)

[Abstract] Abstract (first contribution paragraph): the claim that ACE-level representations 'preserve device-level behavioral distinctions more effectively than whole-profile embeddings' is load-bearing for the first contribution, yet the text provides no specification of the embedding model, vectorization technique, or distance metric used to produce or compare the geometric representations.
[Abstract] Abstract (second and third contribution paragraphs) and evaluation description: the reported superiority of semantic ACE matching under unseen ACEs, drifted hostnames, partial observation, and early-stage real traffic relies on unspecified details of how embeddings are generated, how matching/ranking is performed, and what statistical tests (if any) support the preservation of identification evidence; these omissions leave the central empirical claims only partially verifiable.

minor comments (1)

[Abstract] The abstract states '28 publicly available MUD profiles' but does not indicate how many distinct devices these profiles represent, which would improve clarity when discussing device-level identification performance.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive review and the recommendation for minor revision. The comments correctly identify that the abstract is insufficiently self-contained regarding methodological details. We address each point below and will revise the abstract (and, if needed, cross-references in the evaluation sections) to improve verifiability while preserving the paper's length and focus.

read point-by-point responses

Referee: [Abstract] Abstract (first contribution paragraph): the claim that ACE-level representations 'preserve device-level behavioral distinctions more effectively than whole-profile embeddings' is load-bearing for the first contribution, yet the text provides no specification of the embedding model, vectorization technique, or distance metric used to produce or compare the geometric representations.

Authors: We agree that the abstract should briefly indicate the embedding approach to support the geometric claim. The full manuscript (Section 3) specifies the model, vectorization, and metric; the abstract will be revised to include a concise parenthetical reference to these choices so that the first contribution is verifiable from the abstract alone. revision: yes
Referee: [Abstract] Abstract (second and third contribution paragraphs) and evaluation description: the reported superiority of semantic ACE matching under unseen ACEs, drifted hostnames, partial observation, and early-stage real traffic relies on unspecified details of how embeddings are generated, how matching/ranking is performed, and what statistical tests (if any) support the preservation of identification evidence; these omissions leave the central empirical claims only partially verifiable.

Authors: The evaluation sections of the manuscript describe the embedding generation, matching procedure (top-k ranking by cosine similarity), and the use of rank-based metrics with bootstrap confidence intervals. However, the abstract and any high-level evaluation summary do not restate these elements. We will revise the abstract to include a short clause on the matching method and will ensure the evaluation description explicitly names the statistical support (bootstrap intervals) so the claims are fully verifiable without requiring the reader to locate the methods section. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper's central claims rest on direct empirical evaluation of ACE-level semantic embeddings against 28 external public MUD profiles (1,023 ACEs) and 800k real flows. Geometric preservation, exact vs. semantic matching under controlled variations (unseen ACEs, hostname drift, partial observation), and ranking performance are computed from these independent datasets without any fitted parameters, self-definitional equations, or load-bearing self-citations that reduce the reported results to quantities defined inside the study. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on the domain assumption that MUD ACEs encode stable behavioral primitives whose text descriptions yield geometrically meaningful vectors; no free parameters or invented entities are visible in the abstract.

axioms (1)

domain assumption MUD profiles describe device behavior using Access Control Entries that represent behavioral primitives consisting of protocol, endpoint, direction, and port semantics.
Stated in the opening paragraph of the abstract as the foundation for constructing semantic representations.

pith-pipeline@v0.9.1-grok · 5801 in / 1167 out tokens · 17300 ms · 2026-06-27T06:52:29.148471+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

33 extracted references · 3 canonical work pages

[1]

Tadani Nasser Alyahya, Leonardo Aniello, and Vladimiro Sassone. 2024. ScaNeF-IoT: Scalable Network Fingerprinting for IoT Devices. InProc. ACM ARES. Vienna, Austria

2024
[2]

Shayan Azizi et al. 2025. From Flows to Functions: Macroscopic Behavioral Fingerprinting of IoT Devices via Network Services. arXiv:2512.16348 [cs.NI] https://arxiv.org/abs/2512.16348

arXiv 2025
[3]

Sylee Beltiukov, Satyandra Guthula, Wenbo Guo, Walter Willinger, and Arpit Gupta. 2025. Demystifying Network Foundation Models. InProc. NeurIPS. Atlanta, Georgia, USA

2025
[4]

Jianlyu Chen, Shitao Xiao, Peitian Zhang, Kun Luo, Defu Lian, and Zheng Liu. 2024. M3-Embedding: Multi-Linguality, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation. InFindings of the Association for Computational Linguistics: ACL 2024, Lun-Wei Ku, Andre Martins, and Vivek Srikumar (Eds.). Association for Computati...

work page doi:10.18653/v1/2024.findings-acl.137 2024
[5]

François De Keersmaeker, Ramin Sadre, and Cristel Pelsser. 2024. Supervising Smart Home Device Interactions: A Profile-Based Firewall Approach. InProc IFIP Networking. Thessaloniki, Greece

2024
[6]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. InProc. NAACL. Minneapolis, MN, USA

2019
[7]

Hang Guo et al. 2018. IP-Based IoT Device Detection. InProc. ACM Workshop on IoT S&P. Budapest, Hungary

2018
[8]

Ayyob Hamza et al. 2022. Verifying and Monitoring IoTs Network Behavior Using MUD Profiles.IEEE Transactions on Dependable and Secure Computing19, 1 (2022), 1–18

2022
[9]

Weijia He, Kevin Bryson, Ricardo Calderon, Vijay Prakash, Nick Feamster, Danny Yuxing Huang, and Blase Ur. 2024. Can Allowlists Capture the Variability of Home IoT Device Network Behavior?. InProc. IEEE EuroS&P

2024
[10]

Omar Khattab and Matei Zaharia. 2020. ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT. InProc. ACM SIGIR. Virtual Event, China

2020
[11]

Eliot Lear, Ralph Droms, and Dan Romascanu. 2019. Manufacturer Usage Description Specification. RFC 8520. doi:10.17487/RFC8520

work page doi:10.17487/rfc8520 2019
[12]

Bohan Li, Hao Zhou, Junxian He, Mingxuan Wang, Yiming Yang, and Lei Li. 2020. On the Sentence Embeddings from Pre-trained Language Models. InProc EMNLP. Online

2020
[13]

X. Lin, G. Xiong, G. Gou, Z. Li, J. Shi, and J. Yu. 2022. ET-BERT: A Contextualized Datagram Representation with Pre-training Transformers for Encrypted Traffic Classification. InProc. WWW. Lyon, France

2022
[14]

Eman Maali, Omar Alrawi, and Julie McCann. 2025. Evaluating Machine Learning-Based IoT Device Identification Models for Security Applications. InProc. NDSS. San Diego, CA, USA

2025
[15]

Luca Morgese Zangrandi, Thijs Van Ede, Tim Booij, Savio Sciancalepore, Luca Allodi, and Andrea Continella. 2022. Stepping Out of the MUD: Contextual Threat Information for IoT Devices with Manufacturer-Provided Behavior Profiles. InProc. ACSAC. Austin, TX, USA

2022
[16]

Pashamokhtari et al

A. Pashamokhtari et al. 2022. Combining Stochastic and Deterministic Modeling of IPFIX Records to Infer Connected IoT Devices in Residential ISP Networks.IEEE Internet of Things Journal10, 6 (Nov 2022), 5128–5145

2022
[17]

Arman Pashamokhtari, Norihiro Okui, Masataka Nakahara, Ayumu Kubota, Gustavo Batista, and Hassan Habibi Gharakheili. 2023. Dynamic Inference From IoT Traffic Flows Under Concept Drifts in Residential ISP Networks.IEEE IoT Journal10, 17 (Apr 2023), 15761–15773

2023
[18]

Aleks Pasquini et al. 2025. Robust and Lightweight Modeling of IoT Network Behaviors from Raw Traffic Packets. IEEE Transactions on Machine Learning in Communications and Networking3 (2025), 98–116

2025
[19]

Roberto Perdisci, Thomas Papastergiou, Omar Alrawi, and Manos Antonakakis. 2020. IoTFinder: Efficient Large-Scale Identification of IoT Devices via Passive DNS Traffic Analysis. InProc. IEEE EuroS&P. Genoa, Italy

2020
[20]

Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. arXiv:1908.10084 [cs.CL] https://arxiv.org/abs/1908.10084

Pith/arXiv arXiv 2019
[21]

Ghorbani

Miraqa Safi, Sajjad Dadkhah, Farzaneh Shoeleh, Hassan Mahdikhani, Heather Molyneaux, and Ali A. Ghorbani. 2022. A Survey on IoT Profiling, Fingerprinting, and Identification.ACM TIOT3, 4, Article 26 (Sep 2022), 39 pages

2022
[22]

Said Jawad Saidi et al. 2020. A Haystack Full of Needles: Scalable Detection of IoT Devices in the Wild. InProc. IMC. Virtual Event, USA

2020
[23]

Rahul Anand Sharma, Elahe Soltanaghaei, Anthony Rowe, and Vyas Sekar. 2022. Lumos: Identifying and Localizing Diverse Hidden IoT Devices in an Unfamiliar Environment. InProc. USENIX Security. Boston, MA, USA

2022
[24]

Arunan Sivanathan et al. 2020. Detecting Behavioral Change of IoT Devices Using Clustering-Based Network Traffic Modeling.IEEE Internet of Things Journal7, 8 (Mar 2020), 7295–7309

2020
[25]

Sivanathan, H

A. Sivanathan, H. Habibi Gharakheili, F. Loi, A. Radford, C. Wijenayake, A. Vishwanath, and V. Sivaraman. 2019. Classifying IoT Devices in Smart Environments using Network Traffic Characteristics.IEEE Transactions on Mobile Computing18, 8 (2019), 1745–1759

2019
[26]

Sheng, Minh Tran, Ben Luo, Daniel Coscia, Gustavo Batista, and Hassan Habibi Gharakaheili

Arunan Sivanathan, Deepak Mishra, Sushmita Ruj, Natasha Fernandes, Quan Z. Sheng, Minh Tran, Ben Luo, Daniel Coscia, Gustavo Batista, and Hassan Habibi Gharakaheili. 2026. Real-Time and Trustworthy Classification of IoT 12 S. Witt and H. Habibi Gharakheili Traffic Using Lightweight Deep Learning.IEEE Transactions on Network Science and Engineering13 (2026...

2026
[27]

Jianlin Su, Jiarun Cao, Weijie Liu, and Yangyiwen Ou. 2021. Whitening Sentence Representations for Better Semantics and Faster Retrieval. arXiv:2103.15316 [cs.CL] https://arxiv.org/abs/2103.15316

arXiv 2021
[28]

T. Wang, X. Xie, W. Wang, C. Wang, Y. Zhao, and Y. Cui. 2024. NetMamba: Efficient network traffic classification via pre-training unidirectional Mamba. arXiv:2405.11449

arXiv 2024
[29]

Savindu Wannigama, Arunan Sivanathan, and Hassan Habibi Gharakheili. 2025. Descriptor: UNSW IoT Traffic Data with Packets, Flows, and Protocols (UNSW-IoTraffic).IEEE Data Descriptions2 (Aug 2025), 311–323

2025
[30]

Samuel Witt. 2026. Semantic IoT Behavior. https://github.com/gonzow9/Semantic-IoT-Behavior. GitHub repository. Accessed: 2026-06-11

2026
[31]

Lingjing Yu, Bo Luo, Jun Ma, Zhaoyu Zhou, and Qingyun Liu. 2020. You Are What You Broadcast: Identification of Mobile and IoT Devices from (Public) WiFi. InProc. USENIX Security. Boston, MA, USA

2020
[32]

Ruijie Zhao, Mingwei Zhan, Xianwen Deng, Yanhao Wang, Yijun Wang, Guan Gui, and Zhi Xue. 2023. Yet Another Traffic Classifier: A Masked Autoencoder Based Traffic Transformer with Multi-Level Flow Representation.Proceedings of the AAAI Conference on Artificial Intelligence37 (06 2023), 5420–5427. doi:10.1609/aaai.v37i4.25674

work page doi:10.1609/aaai.v37i4.25674 2023
[33]

Rush, and Minlan Yu

Jiawei Zhou, Woojeong Kim, Zhiying Xu, Alexander M. Rush, and Minlan Yu. 2024. NetFlowGen: Leveraging Generative Pre-training for Network Traffic Dynamics. arXiv:2412.20635 [cs.LG] https://arxiv.org/abs/2412.20635 Semantic Identification of IoT Devices from Behavioral Primitives 13 A Generative AI Usage Statement The authors developed, verified, and inter...

arXiv 2024

[1] [1]

Tadani Nasser Alyahya, Leonardo Aniello, and Vladimiro Sassone. 2024. ScaNeF-IoT: Scalable Network Fingerprinting for IoT Devices. InProc. ACM ARES. Vienna, Austria

2024

[2] [2]

Shayan Azizi et al. 2025. From Flows to Functions: Macroscopic Behavioral Fingerprinting of IoT Devices via Network Services. arXiv:2512.16348 [cs.NI] https://arxiv.org/abs/2512.16348

arXiv 2025

[3] [3]

Sylee Beltiukov, Satyandra Guthula, Wenbo Guo, Walter Willinger, and Arpit Gupta. 2025. Demystifying Network Foundation Models. InProc. NeurIPS. Atlanta, Georgia, USA

2025

[4] [4]

Jianlyu Chen, Shitao Xiao, Peitian Zhang, Kun Luo, Defu Lian, and Zheng Liu. 2024. M3-Embedding: Multi-Linguality, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation. InFindings of the Association for Computational Linguistics: ACL 2024, Lun-Wei Ku, Andre Martins, and Vivek Srikumar (Eds.). Association for Computati...

work page doi:10.18653/v1/2024.findings-acl.137 2024

[5] [5]

François De Keersmaeker, Ramin Sadre, and Cristel Pelsser. 2024. Supervising Smart Home Device Interactions: A Profile-Based Firewall Approach. InProc IFIP Networking. Thessaloniki, Greece

2024

[6] [6]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. InProc. NAACL. Minneapolis, MN, USA

2019

[7] [7]

Hang Guo et al. 2018. IP-Based IoT Device Detection. InProc. ACM Workshop on IoT S&P. Budapest, Hungary

2018

[8] [8]

Ayyob Hamza et al. 2022. Verifying and Monitoring IoTs Network Behavior Using MUD Profiles.IEEE Transactions on Dependable and Secure Computing19, 1 (2022), 1–18

2022

[9] [9]

Weijia He, Kevin Bryson, Ricardo Calderon, Vijay Prakash, Nick Feamster, Danny Yuxing Huang, and Blase Ur. 2024. Can Allowlists Capture the Variability of Home IoT Device Network Behavior?. InProc. IEEE EuroS&P

2024

[10] [10]

Omar Khattab and Matei Zaharia. 2020. ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT. InProc. ACM SIGIR. Virtual Event, China

2020

[11] [11]

Eliot Lear, Ralph Droms, and Dan Romascanu. 2019. Manufacturer Usage Description Specification. RFC 8520. doi:10.17487/RFC8520

work page doi:10.17487/rfc8520 2019

[12] [12]

Bohan Li, Hao Zhou, Junxian He, Mingxuan Wang, Yiming Yang, and Lei Li. 2020. On the Sentence Embeddings from Pre-trained Language Models. InProc EMNLP. Online

2020

[13] [13]

X. Lin, G. Xiong, G. Gou, Z. Li, J. Shi, and J. Yu. 2022. ET-BERT: A Contextualized Datagram Representation with Pre-training Transformers for Encrypted Traffic Classification. InProc. WWW. Lyon, France

2022

[14] [14]

Eman Maali, Omar Alrawi, and Julie McCann. 2025. Evaluating Machine Learning-Based IoT Device Identification Models for Security Applications. InProc. NDSS. San Diego, CA, USA

2025

[15] [15]

Luca Morgese Zangrandi, Thijs Van Ede, Tim Booij, Savio Sciancalepore, Luca Allodi, and Andrea Continella. 2022. Stepping Out of the MUD: Contextual Threat Information for IoT Devices with Manufacturer-Provided Behavior Profiles. InProc. ACSAC. Austin, TX, USA

2022

[16] [16]

Pashamokhtari et al

A. Pashamokhtari et al. 2022. Combining Stochastic and Deterministic Modeling of IPFIX Records to Infer Connected IoT Devices in Residential ISP Networks.IEEE Internet of Things Journal10, 6 (Nov 2022), 5128–5145

2022

[17] [17]

Arman Pashamokhtari, Norihiro Okui, Masataka Nakahara, Ayumu Kubota, Gustavo Batista, and Hassan Habibi Gharakheili. 2023. Dynamic Inference From IoT Traffic Flows Under Concept Drifts in Residential ISP Networks.IEEE IoT Journal10, 17 (Apr 2023), 15761–15773

2023

[18] [18]

Aleks Pasquini et al. 2025. Robust and Lightweight Modeling of IoT Network Behaviors from Raw Traffic Packets. IEEE Transactions on Machine Learning in Communications and Networking3 (2025), 98–116

2025

[19] [19]

Roberto Perdisci, Thomas Papastergiou, Omar Alrawi, and Manos Antonakakis. 2020. IoTFinder: Efficient Large-Scale Identification of IoT Devices via Passive DNS Traffic Analysis. InProc. IEEE EuroS&P. Genoa, Italy

2020

[20] [20]

Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. arXiv:1908.10084 [cs.CL] https://arxiv.org/abs/1908.10084

Pith/arXiv arXiv 2019

[21] [21]

Ghorbani

Miraqa Safi, Sajjad Dadkhah, Farzaneh Shoeleh, Hassan Mahdikhani, Heather Molyneaux, and Ali A. Ghorbani. 2022. A Survey on IoT Profiling, Fingerprinting, and Identification.ACM TIOT3, 4, Article 26 (Sep 2022), 39 pages

2022

[22] [22]

Said Jawad Saidi et al. 2020. A Haystack Full of Needles: Scalable Detection of IoT Devices in the Wild. InProc. IMC. Virtual Event, USA

2020

[23] [23]

Rahul Anand Sharma, Elahe Soltanaghaei, Anthony Rowe, and Vyas Sekar. 2022. Lumos: Identifying and Localizing Diverse Hidden IoT Devices in an Unfamiliar Environment. InProc. USENIX Security. Boston, MA, USA

2022

[24] [24]

Arunan Sivanathan et al. 2020. Detecting Behavioral Change of IoT Devices Using Clustering-Based Network Traffic Modeling.IEEE Internet of Things Journal7, 8 (Mar 2020), 7295–7309

2020

[25] [25]

Sivanathan, H

A. Sivanathan, H. Habibi Gharakheili, F. Loi, A. Radford, C. Wijenayake, A. Vishwanath, and V. Sivaraman. 2019. Classifying IoT Devices in Smart Environments using Network Traffic Characteristics.IEEE Transactions on Mobile Computing18, 8 (2019), 1745–1759

2019

[26] [26]

Sheng, Minh Tran, Ben Luo, Daniel Coscia, Gustavo Batista, and Hassan Habibi Gharakaheili

Arunan Sivanathan, Deepak Mishra, Sushmita Ruj, Natasha Fernandes, Quan Z. Sheng, Minh Tran, Ben Luo, Daniel Coscia, Gustavo Batista, and Hassan Habibi Gharakaheili. 2026. Real-Time and Trustworthy Classification of IoT 12 S. Witt and H. Habibi Gharakheili Traffic Using Lightweight Deep Learning.IEEE Transactions on Network Science and Engineering13 (2026...

2026

[27] [27]

Jianlin Su, Jiarun Cao, Weijie Liu, and Yangyiwen Ou. 2021. Whitening Sentence Representations for Better Semantics and Faster Retrieval. arXiv:2103.15316 [cs.CL] https://arxiv.org/abs/2103.15316

arXiv 2021

[28] [28]

T. Wang, X. Xie, W. Wang, C. Wang, Y. Zhao, and Y. Cui. 2024. NetMamba: Efficient network traffic classification via pre-training unidirectional Mamba. arXiv:2405.11449

arXiv 2024

[29] [29]

Savindu Wannigama, Arunan Sivanathan, and Hassan Habibi Gharakheili. 2025. Descriptor: UNSW IoT Traffic Data with Packets, Flows, and Protocols (UNSW-IoTraffic).IEEE Data Descriptions2 (Aug 2025), 311–323

2025

[30] [30]

Samuel Witt. 2026. Semantic IoT Behavior. https://github.com/gonzow9/Semantic-IoT-Behavior. GitHub repository. Accessed: 2026-06-11

2026

[31] [31]

Lingjing Yu, Bo Luo, Jun Ma, Zhaoyu Zhou, and Qingyun Liu. 2020. You Are What You Broadcast: Identification of Mobile and IoT Devices from (Public) WiFi. InProc. USENIX Security. Boston, MA, USA

2020

[32] [32]

Ruijie Zhao, Mingwei Zhan, Xianwen Deng, Yanhao Wang, Yijun Wang, Guan Gui, and Zhi Xue. 2023. Yet Another Traffic Classifier: A Masked Autoencoder Based Traffic Transformer with Multi-Level Flow Representation.Proceedings of the AAAI Conference on Artificial Intelligence37 (06 2023), 5420–5427. doi:10.1609/aaai.v37i4.25674

work page doi:10.1609/aaai.v37i4.25674 2023

[33] [33]

Rush, and Minlan Yu

Jiawei Zhou, Woojeong Kim, Zhiying Xu, Alexander M. Rush, and Minlan Yu. 2024. NetFlowGen: Leveraging Generative Pre-training for Network Traffic Dynamics. arXiv:2412.20635 [cs.LG] https://arxiv.org/abs/2412.20635 Semantic Identification of IoT Devices from Behavioral Primitives 13 A Generative AI Usage Statement The authors developed, verified, and inter...

arXiv 2024