Can Machine Learning Break Wi-Fi Privacy? A Study on MAC Address Randomization
Pith reviewed 2026-06-25 19:04 UTC · model grok-4.3
The pith
Machine learning identifies Wi-Fi devices despite MAC address randomization by clustering unencrypted hardware signals.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that bitwise decomposition of the HT capabilities field, when combined with inter-probe arrival times and multiple simulated RSSI values, supplies enough stable, unencrypted information for unsupervised clustering to re-identify devices even after MAC randomization. Tested on three algorithms across a 22-device corpus, DBSCAN produces the highest accuracy of 89.6 percent, showing that probe-frame metadata alone suffices for passive device tracking in the evaluated scenarios.
What carries the argument
Bitwise decomposition of the HT capabilities information field, which extracts per-bit hardware features from probe frames and feeds them into DBSCAN clustering together with IFAT and three SRSSI measurements.
If this is right
- MAC randomization as currently specified in IEEE 802.11 does not prevent device linkage by passive observers.
- Probe-frame metadata fields must be altered or randomized to restore privacy.
- Standardization bodies need to consider additional countermeasures beyond MAC address changes.
- Unsupervised clustering on hardware fingerprints can serve as a low-cost attack against randomized identifiers.
- Three signal-strength samples plus decomposed capabilities already yield usable identification rates.
Where Pith is reading between the lines
- The same decomposition and clustering approach could be applied to other management frames or to Bluetooth low-energy advertisements that also carry capability bits.
- If manufacturers begin randomizing the HT field bits, the accuracy reported here would serve as an upper bound that future privacy mechanisms must beat.
- Extending the method to include more than three RSSI samples or adding probe-request rate features might raise accuracy further in stable environments.
Load-bearing premise
The unencrypted hardware specifications, probe timing patterns, and signal-strength values stay stable enough across devices and settings to produce distinct clusters.
What would settle it
A controlled test in which the same 22 devices move through multiple real indoor and outdoor locations while recording probe frames, then measuring whether DBSCAN accuracy falls below 50 percent when RSSI varies naturally.
Figures
read the original abstract
Medium Access Control (MAC) address randomization has been widely adopted during the IEEE 802.11 network discovery phase as a countermeasure against passive tracking. This paper exposes vulnerabilities in these privacy protocols by demonstrating that devices remain identifiable using Machine Learning (ML)-based fingerprinting. To study the potential tracking capabilities of a passive attacker, we evaluate different eavesdropping scenarios and configurations. To this end, we extract unencrypted hardware specifications from Probe Frames, which we combine with the Inter-Probe Frame Arrival Time (IFAT) and Simulated Received Signal Strength Indication (SRSSI) signals. A core contribution of this paper is the bitwise decomposition of the High Throughput (HT) capabilities information field, which improves device identification accuracy. We evaluate this de-randomization approach using three unsupervised clustering algorithms (K-Means, DBSCAN, and OPTICS) across a dataset of 22 devices from six manufacturers. Our results show that DBSCAN, when using decomposed HT capabilities information and three SRSSI measurements, achieves a global accuracy up to 89.6%. This suggests that the existing MAC randomization solutions are insufficient and underscores the need for enhancing privacy within Wi-Fi standardization.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that MAC address randomization in Wi-Fi probe frames can be defeated by unsupervised ML clustering on unencrypted hardware fields (including a proposed bitwise decomposition of the HT capabilities information element), inter-probe frame arrival time (IFAT), and simulated RSSI (SRSSI). On a dataset of 22 devices from six manufacturers, DBSCAN using the decomposed HT features plus three SRSSI values reaches a global accuracy of 89.6%, suggesting that current randomization is insufficient.
Significance. If the experimental results are reproducible under realistic propagation conditions, the work would provide concrete evidence that passive attackers can track devices despite randomization, directly informing ongoing IEEE 802.11 privacy discussions. The bitwise HT decomposition is a concrete, low-cost feature-engineering contribution that could be adopted in other fingerprinting studies.
major comments (3)
- [Abstract / Methods] Abstract and Methods section: The headline accuracy figures (89.6 % global, DBSCAN with decomposed HT + three SRSSI) are stated without any description of data-collection protocol, number of probe frames captured per device, capture duration, device placement, or statistical validation (e.g., cross-validation, bootstrap confidence intervals). This omission prevents assessment of whether the reported performance is robust or confounded by collection artifacts.
- [§4 / §5] §4 (Feature extraction) and §5 (Evaluation): The SRSSI feature is generated by simulation rather than measured RSSI. The manuscript does not quantify how the simulation models distance-dependent path loss, multipath fading, shadowing, or orientation effects; if these real-world distortions cause the three-dimensional SRSSI vectors of distinct devices to overlap, the DBSCAN clusters will merge and the 89.6 % figure will not hold.
- [§5.3] §5.3 (Clustering results): The global accuracy metric is reported for a single configuration (decomposed HT + three SRSSI) but no ablation is shown that isolates the contribution of each component or tests sensitivity to the number of SRSSI samples. Without these controls it is unclear whether the claimed improvement is driven by the HT decomposition or by the simulated RSSI values.
minor comments (2)
- [Methods] The manuscript should explicitly state the exact number of probe frames collected per device and the capture environment (anechoic chamber, office, etc.) so that the stability of IFAT and SRSSI can be evaluated.
- [§4.1] Notation for the decomposed HT bit fields should be defined in a table or equation rather than inline text to improve readability.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment below and will incorporate revisions to enhance clarity and completeness of the manuscript.
read point-by-point responses
-
Referee: [Abstract / Methods] Abstract and Methods section: The headline accuracy figures (89.6 % global, DBSCAN with decomposed HT + three SRSSI) are stated without any description of data-collection protocol, number of probe frames captured per device, capture duration, device placement, or statistical validation (e.g., cross-validation, bootstrap confidence intervals). This omission prevents assessment of whether the reported performance is robust or confounded by collection artifacts.
Authors: We agree that the data-collection protocol requires more explicit detail for reproducibility. The revised manuscript will expand the Methods section with a full description of the experimental setup, including the number of probe frames captured per device, capture durations, device placements, and the statistical validation methods used (cross-validation and bootstrap procedures). revision: yes
-
Referee: [§4 / §5] §4 (Feature extraction) and §5 (Evaluation): The SRSSI feature is generated by simulation rather than measured RSSI. The manuscript does not quantify how the simulation models distance-dependent path loss, multipath fading, shadowing, or orientation effects; if these real-world distortions cause the three-dimensional SRSSI vectors of distinct devices to overlap, the DBSCAN clusters will merge and the 89.6 % figure will not hold.
Authors: SRSSI is simulated to enable controlled evaluation across varied conditions. We will revise §4 to specify the simulation parameters for path loss, multipath fading, shadowing, and orientation, and will add discussion of how these affect vector overlap and clustering stability. revision: yes
-
Referee: [§5.3] §5.3 (Clustering results): The global accuracy metric is reported for a single configuration (decomposed HT + three SRSSI) but no ablation is shown that isolates the contribution of each component or tests sensitivity to the number of SRSSI samples. Without these controls it is unclear whether the claimed improvement is driven by the HT decomposition or by the simulated RSSI values.
Authors: We concur that ablation and sensitivity analyses strengthen the claims. The revised §5.3 will include ablations isolating each feature (HT decomposition, IFAT, SRSSI) and tests varying the number of SRSSI samples to quantify their individual contributions. revision: yes
Circularity Check
No circularity: empirical clustering accuracy on extracted features
full rationale
The paper reports measured clustering accuracy (DBSCAN at 89.6%) from applying standard unsupervised algorithms to features extracted from a dataset of 22 devices. Feature extraction (decomposed HT capabilities, IFAT, SRSSI) and accuracy computation are direct experimental steps with no equations, fitted parameters, or self-citations that reduce the result to its inputs by construction. The result is framed as an outcome of the evaluation, not a derived prediction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Hardware specifications, IFAT, and SRSSI extracted from probe frames are stable and device-unique even under MAC randomization
Reference graph
Works this paper leans on
-
[1]
Henry, B
J. Henry, B. Hart, B. Gupta, and M. Smith,Wi-Fi 7 In Depth: Your Guide to Mastering Wi-Fi 7, the 802.11be Protocol, and Their Deployment. Pearson Education (Cisco Press), 2024
2024
-
[2]
Why MAC Address Randomization is not Enough: An Analysis of Wi-Fi Network Discovery Mechanisms,
M. Vanhoef, C. Matte, M. Cunche, L. S. Cardoso, and F. Piessens, “Why MAC Address Randomization is not Enough: An Analysis of Wi-Fi Network Discovery Mechanisms,” inProceedings of the 11th ACM on Asia Conference on Computer and Communications Security, ser. ASIA CCS ’16. New York, NY , USA: Association for Computing Machinery, 2016, p. 413–424
2016
-
[3]
How talkative is your mobile device? an experimental study of Wi-Fi probe requests,
J. Freudiger, “How talkative is your mobile device? an experimental study of Wi-Fi probe requests,” inProceedings of the 8th ACM Confer- ence on Security & Privacy in Wireless and Mobile Networks, ser. WiSec ’15. New York, NY , USA: Association for Computing Machinery, 2015
2015
-
[4]
Defeating MAC Address Randomization Through Timing Attacks,
C. Matte, M. Cunche, F. Rousseau, and M. Vanhoef, “Defeating MAC Address Randomization Through Timing Attacks,” inProceedings of the 9th ACM Conference on Security & Privacy in Wireless and Mobile Networks, ser. WiSec ’16. New York, NY , USA: Association for Computing Machinery, 2016, p. 15–20
2016
-
[5]
Identifying device type from cross channel probe request behavior,
W. Praharenka and I. Nikolaidis, “Identifying device type from cross channel probe request behavior,” inProceedings of the 14th ACM Conference on Security and Privacy in Wireless and Mobile Networks, ser. WiSec ’21. New York, NY , USA: Association for Computing Machinery, 2021, p. 392–394
2021
-
[6]
A dataset of labelled device Wi-Fi probe requests for MAC address de-randomization,
L. Pintor and L. Atzori, “A dataset of labelled device Wi-Fi probe requests for MAC address de-randomization,”Computer Networks, vol. 205, p. 108783, 2022
2022
-
[7]
Advancements in Wi-Fi-Based Passenger Counting and Crowd Monitoring: Techniques and Applications,
L. Pintor, “Advancements in Wi-Fi-Based Passenger Counting and Crowd Monitoring: Techniques and Applications,” Ph.D. dissertation, Universit`a degli Studi di Cagliari, 2024, ph.D. dissertation. [Online]. Available: https://hdl.handle.net/11584/394767
2024
-
[8]
Poster: Can You Find Me?: Link- ing Devices Despite Wi-Fi MAC Randomization at MobiCom 2023,
F. Cifuentes-Urtubey and R. Kravets, “Poster: Can You Find Me?: Link- ing Devices Despite Wi-Fi MAC Randomization at MobiCom 2023,” in Proceedings of the 30th Annual International Conference on Mobile Computing and Networking, ser. ACM MobiCom ’24. New York, NY , USA: Association for Computing Machinery, 2024, p. 1668–1670
2023
-
[9]
De-Randomization of MAC Addresses Using Fingerprints and RSSI With ML for Wi-Fi Analytics,
A. P ´erez-Hern´andez, M. N. Barreras-Mart ´ın, J. A. Becerra, M. J. Madero-Ayora, and P. Aguilera, “De-Randomization of MAC Addresses Using Fingerprints and RSSI With ML for Wi-Fi Analytics,”IEEE Access, vol. 12, pp. 150 857–150 868, 2024
2024
-
[10]
IEEE Standards Association, “IEEE P802.11n™/D11.0 Draft STAN- DARD for Information Technology— Telecommunications and infor- mation exchange between systems— Local and metropolitan area net- works— Specific requirements Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) specifications Amendment 5: Enhancements for Higher Throughput...
2009
-
[11]
Distance measurement model based on RSSI in WSN,
X. Jiuqiang, W. Liu, F. Lang, Y . Zhang, and C. Wang, “Distance measurement model based on RSSI in WSN,”Wireless Sensor Network, vol. 2, pp. 606–611, 01 2010
2010
-
[12]
Clustering,
scikit-learn, “Clustering,” https://scikit- learn.org/stable/modules/clustering.html, scikit-learn, 2026, accessed: 2026-05-08
2026
-
[13]
Gil-Aluja,The Hungarian assignment algorithm
J. Gil-Aluja,The Hungarian assignment algorithm. Boston, MA: Springer US, 1998, pp. 148–158
1998
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.