PersonaFingerprint: Measuring Persona Inference on Modern Websites with LLM-Driven Browsing
Pith reviewed 2026-05-20 17:35 UTC · model grok-4.3
The pith
An adversary can infer a user's persona from packet lengths and timings in encrypted web traffic.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors show that persona information is already latent in representations learned by standard website fingerprinting models and can be extracted or amplified from packet-length and inter-arrival-time sequences collected on modern websites. Using an LLM-driven multi-agent browsing framework that enforces controllable persona constraints, they generate traffic traces and formalize inference under closed-set and open-world settings, achieving approximately 84 percent persona accuracy on mixed-site traffic while demonstrating that a multi-task objective can reach around 80 percent persona accuracy with only modest loss in site classification performance.
What carries the argument
LLM-driven multi-agent browsing framework that enforces controllable persona constraints while a computer-use agent interacts with real websites to produce encrypted traffic traces.
If this is right
- Persona inference reaches about 84 percent accuracy on mixed-site traffic across ten modern websites and fifteen personas.
- Persona information already exists inside standard website fingerprinting models and can be amplified at low cost.
- A multi-task training objective achieves around 80 percent persona accuracy while retaining approximately 93 percent site classification performance.
- Encrypted traffic metadata leaks not only the visited site but also how the user browses and who is browsing.
Where Pith is reading between the lines
- Defenses that only pad packet sizes or add dummy traffic may still leave timing patterns that expose persona.
- The same metadata could be used to infer other user attributes such as age group or device type beyond the tested personas.
- Privacy tools and anonymization networks may need to incorporate explicit randomization of inter-arrival times to limit these leaks.
Load-bearing premise
The traffic traces generated under enforced persona constraints accurately represent the behavior of actual users without introducing simulation artifacts.
What would settle it
A side-by-side test measuring persona inference accuracy on traffic collected from real human users who follow the same persona instructions on the identical sites and comparing it to the reported simulation results.
Figures
read the original abstract
Website Fingerprinting (WFP) has traditionally focused on inferring which website a user visits from encrypted traffic metadata such as packet sizes and timing. In this paper, we identify and quantify a new privacy risk in modern web settings: an adversary can infer a user's persona using only packet-length and inter-arrival-time sequences. To study this risk at scale, we build an LLM-driven multi-agent browsing framework that enforces controllable persona constraints while a computer-use agent interacts with real websites and collects corresponding encrypted traffic traces. We formalize persona fingerprinting under both closed-set and open-world settings and further evaluate whether persona information is already embedded in representations learned by existing WFP models and can be amplified at low cost. Across 10 modern websites and 15 personas (plus an open-world class), persona inference achieves about 84% accuracy on mixed-site traffic; moreover, a lightweight multi-task objective can boost persona accuracy to around 80% while retaining strong site classification performance (about 93% baseline). Our results show that, on modern websites, encrypted traffic metadata can leak not only which site a user visits, but also how they browse and who is browsing.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that an adversary can infer a user's persona from encrypted web traffic using only packet-length and inter-arrival-time sequences. It introduces an LLM-driven multi-agent browsing framework that enforces controllable persona constraints while interacting with real websites to generate traces, formalizes persona fingerprinting in closed-set and open-world settings, and reports ~84% persona inference accuracy on mixed-site traffic across 10 sites and 15 personas. It further shows that a lightweight multi-task objective can boost persona accuracy to ~80% while retaining ~93% site classification performance.
Significance. If the simulated traces prove representative of real users, the work identifies a meaningful extension of website fingerprinting to user attribute inference, with implications for privacy in encrypted modern web traffic. The LLM multi-agent framework for scalable, controllable trace generation is a methodological strength that enables systematic study at this scale.
major comments (2)
- [Abstract and Evaluation] Abstract and §5 (Evaluation): The reported ~84% persona inference accuracy and ~80% boosted accuracy lack any details on trace counts per persona/site, number of runs, statistical significance tests, error bars, or controls for confounds such as site-specific traffic patterns. This information is required to assess whether the figures reliably support the central privacy-risk claim.
- [Framework and Trace Collection] §3 (LLM-driven multi-agent framework) and §4 (Trace collection): The central claim rests on traces generated by enforcing persona constraints via LLM agents. No comparison or statistical matching (e.g., Kolmogorov-Smirnov tests on packet-size or IAT distributions) against human-collected traces for the same personas is described, leaving open whether accuracies reflect genuine metadata leakage or simulation artifacts such as reduced behavioral variability.
minor comments (1)
- [Abstract] Abstract: The open-world class handling and how 'mixed-site traffic' is constructed could be stated more explicitly to aid reproducibility.
Simulated Author's Rebuttal
We thank the referee for their constructive comments on our manuscript. We address each of the major comments below and outline the revisions we will make to strengthen the paper.
read point-by-point responses
-
Referee: [Abstract and Evaluation] Abstract and §5 (Evaluation): The reported ~84% persona inference accuracy and ~80% boosted accuracy lack any details on trace counts per persona/site, number of runs, statistical significance tests, error bars, or controls for confounds such as site-specific traffic patterns. This information is required to assess whether the figures reliably support the central privacy-risk claim.
Authors: We agree that the evaluation section would benefit from more detailed reporting to support the claims. In the revised manuscript, we will expand §5 to include a table with the number of traces collected per persona and per site, specify the number of independent runs performed, report accuracies with error bars representing standard deviation across runs, include results of statistical significance tests (e.g., t-tests against random baselines), and provide per-site accuracy breakdowns to address potential confounds from site-specific traffic patterns. revision: yes
-
Referee: [Framework and Trace Collection] §3 (LLM-driven multi-agent framework) and §4 (Trace collection): The central claim rests on traces generated by enforcing persona constraints via LLM agents. No comparison or statistical matching (e.g., Kolmogorov-Smirnov tests on packet-size or IAT distributions) against human-collected traces for the same personas is described, leaving open whether accuracies reflect genuine metadata leakage or simulation artifacts such as reduced behavioral variability.
Authors: The referee raises a valid point regarding the validation of our simulated traces. Our current work relies on the LLM multi-agent framework to generate controllable and diverse persona-driven interactions with real websites, which we believe captures realistic behavioral patterns. However, we did not include direct statistical comparisons with human traces. We will revise the manuscript to add a discussion in the limitations section explicitly acknowledging this and outlining plans for future human validation studies. We maintain that the use of live website interactions mitigates some simulation artifacts, but we will not claim equivalence without such comparisons. revision: partial
Circularity Check
No circularity; empirical evaluation on generated traces is self-contained
full rationale
The paper builds an LLM multi-agent framework to produce labeled traffic traces under explicit persona constraints, then trains and evaluates standard classifiers on packet-length and timing features extracted from those traces. No derivation step reduces a claimed result to its own inputs by construction, no fitted parameter is relabeled as a prediction, and no load-bearing premise rests on a self-citation chain or imported uniqueness theorem. The reported accuracies (approximately 84% persona inference, 93% site classification) are computed directly on held-out portions of the simulated dataset; the central claim therefore remains an empirical measurement within the generated distribution rather than a tautological restatement of the simulation rules.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption LLM agents can enforce controllable persona constraints that generate distinguishable and realistic encrypted traffic traces
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We build an LLM-driven multi-agent browsing framework that enforces controllable persona constraints while a computer-use agent interacts with real websites and collects corresponding encrypted traffic traces... persona inference achieves about 84% accuracy on mixed-site traffic
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Fingerprinting attack on tor anonymity using deep learning
Kotaro Abe and Shigeki Goto. Fingerprinting attack on tor anonymity using deep learning. 2016. 12
work page 2016
-
[2]
Realistic website fingerprinting by augmenting network traces
Alireza Bahramali, Ardavan Bozorgi, and Amir Houmansadr. Realistic website fingerprinting by augmenting network traces. InProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, CCS ’23, pages 1035–1049, New York, NY , USA, 2023. Association for Computing Machinery
work page 2023
-
[3]
Probing classifiers: Promises, short- comings, and advances, 2021
Yonatan Belinkov. Probing classifiers: Promises, short- comings, and advances, 2021
work page 2021
-
[4]
Sanjit Bhat, David Lu, Albert Kwon, and Srinivas De- vadas. Var-cnn: A data-efficient website fingerprinting attack based on deep learning.Proceedings on Privacy Enhancing Technologies, 2019(4):292–310, July 2019
work page 2019
-
[5]
Sarah Bird, Ilana Segall, and Martin Lopatka. Replica- tion: Why we still can’t browse in peace: On the unique- ness and reidentifiability of web browsing histories. In Sixteenth Symposium on Usable Privacy and Security (SOUPS 2020), pages 489–503. USENIX Association, August 2020
work page 2020
-
[6]
Tina Burns, Chuxu Song, Ivan Seskar, Jorge Ortiz, and Richard P. Martin. A simplified machine learning ap- proach to classifying individual websites. InGLOBE- COM 2022 - 2022 IEEE Global Communications Con- ference, pages 6109–6114, 2022
work page 2022
-
[7]
A systematic approach to de- veloping and evaluating website fingerprinting defenses
Xiang Cai, Rishab Nithyanand, Tao Wang, Rob John- son, and Ian Goldberg. A systematic approach to de- veloping and evaluating website fingerprinting defenses. InProceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, CCS ’14, pages 227–238, New York, NY , USA, 2014. Association for Computing Machinery
work page 2014
-
[8]
A simple framework for contrastive learning of visual representations, 2020
Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. A simple framework for contrastive learning of visual representations, 2020
work page 2020
-
[9]
Yifei Cheng, Yujia Zhu, Baiyang Li, Peishuai Sun, Yong Ding, Xinhao Deng, and Qingyun Liu. Holmes & wat- son: A robust and lightweight https website fingerprint- ing through http version parallelism. InProceedings of the ACM on Web Conference 2025, WWW ’25, pages 1078–1092, New York, NY , USA, 2025. Association for Computing Machinery
work page 2025
-
[10]
Online website fingerprinting: Evaluating website fin- gerprinting attacks on tor in the real world
Giovanni Cherubin, Rob Jansen, and Carmela Troncoso. Online website fingerprinting: Evaluating website fin- gerprinting attacks on tor in the real world. In31st USENIX Security Symposium (USENIX Security 22), pages 753–770, Boston, MA, August 2022. USENIX Association
work page 2022
-
[11]
Thibault Le Sellier De Chezelles, Maxime Gasse, Alexandre Drouin, Massimo Caccia, Léo Boisvert, Megh Thakkar, Tom Marty, Rim Assouel, Sahar Omidi Shayegan, Lawrence Keunho Jang, Xing Han Lù, Ori Yoran, Dehan Kong, Frank F. Xu, Siva Reddy, Quentin Cappart, Graham Neubig, Ruslan Salakhutdinov, Nico- las Chapados, and Alexandre Lacoste. The browsergym ecosys...
work page 2025
-
[12]
Mind2web: Towards a generalist agent for the web, 2023
Xiang Deng, Yu Gu, Boyuan Zheng, Shijie Chen, Samuel Stevens, Boshi Wang, Huan Sun, and Yu Su. Mind2web: Towards a generalist agent for the web, 2023
work page 2023
-
[13]
Xinhao Deng, Jingyou Chen, Linxiao Yu, Yixiang Zhang, Zhongyi Gu, Changhao Qiu, Xiyuan Zhao, Ke Xu, and Qi Li. Beyond a single perspective: To- wards a realistic evaluation of website fingerprinting attacks, 2025
work page 2025
-
[14]
Robust multi-tab website fingerprinting attacks in the wild
Xinhao Deng, Qilei Yin, Zhuotao Liu, Xiyuan Zhao, Qi Li, Mingwei Xu, Ke Xu, and Jianping Wu. Robust multi-tab website fingerprinting attacks in the wild. In 2023 IEEE Symposium on Security and Privacy (SP), pages 1005–1022, 2023
work page 2023
-
[15]
Towards robust multi-tab website fingerprinting, 2025
Xinhao Deng, Xiyuan Zhao, Qilei Yin, Zhuotao Liu, Qi Li, Mingwei Xu, Ke Xu, and Jianping Wu. Towards robust multi-tab website fingerprinting, 2025
work page 2025
-
[16]
Alexandre Drouin, Maxime Gasse, Massimo Caccia, Issam H. Laradji, Manuel Del Verme, Tom Marty, Léo Boisvert, Megh Thakkar, Quentin Cappart, David Vazquez, Nicolas Chapados, and Alexandre Lacoste. Workarena: How capable are web agents at solving com- mon knowledge work tasks?, 2024
work page 2024
-
[17]
Zero-delay lightweight defenses against website fingerprinting
Jiajun Gong and Tao Wang. Zero-delay lightweight defenses against website fingerprinting. In29th USENIX Security Symposium (USENIX Security 20), pages 717–
-
[18]
USENIX Association, August 2020
work page 2020
-
[19]
Bapm: Block attention profiling model for multi-tab website fingerprinting at- tacks on tor
Zhong Guan, Gang Xiong, Gaopeng Gou, Zhen Li, Mingxin Cui, and Chang Liu. Bapm: Block attention profiling model for multi-tab website fingerprinting at- tacks on tor. InProceedings of the 37th Annual Com- puter Security Applications Conference, ACSAC ’21, pages 248–259, New York, NY , USA, 2021. Association for Computing Machinery
work page 2021
-
[20]
Transformer-based model for multi-tab website finger- printing attack
Zhaoxin Jin, Tianbo Lu, Shuang Luo, and Jiaze Shang. Transformer-based model for multi-tab website finger- printing attack. InProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, CCS ’23, pages 1050–1064, New York, NY , USA, 2023. Association for Computing Machinery. 13
work page 2023
-
[21]
Camel: Com- municative agents for "mind" exploration of large lan- guage model society, 2023
Guohao Li, Hasan Abed Al Kader Hammoud, Hani Itani, Dmitrii Khizbullin, and Bernard Ghanem. Camel: Com- municative agents for "mind" exploration of large lan- guage model society, 2023
work page 2023
-
[22]
Measuring information leakage in website fingerprinting attacks and defenses
Shuai Li, Huajun Guo, and Nicholas Hopper. Measuring information leakage in website fingerprinting attacks and defenses. InProceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, CCS ’18, pages 1977–1992, New York, NY , USA, 2018. Association for Computing Machinery
work page 2018
-
[23]
Why johnny can’t browse in peace: On the uniqueness of web browsing history patterns
Lukasz Olejnik, Claude Castelluccia, and Artur Janc. Why johnny can’t browse in peace: On the uniqueness of web browsing history patterns. InInternational Sym- posium on Privacy Enhancing Technologies, 2012
work page 2012
-
[24]
Browsing behavior ex- poses identities on the web, 2024
Marcos Oliveira, Junran Yang, Daniel Griffiths, Denis Bonnay, and Juhi Kulshrestha. Browsing behavior ex- poses identities on the web, 2024
work page 2024
-
[25]
Webcan- vas: Benchmarking web agents in online environments, 2024
Yichen Pan, Dehan Kong, Sida Zhou, Cheng Cui, Yifei Leng, Bing Jiang, Hangyu Liu, Yanyi Shang, Shuyan Zhou, Tongshuang Wu, and Zhengyang Wu. Webcan- vas: Benchmarking web agents in online environments, 2024
work page 2024
-
[26]
Website fingerprinting at internet scale
Andriy Panchenko, Fabian Lanze, Jan Pennekamp, Thomas Engel, Andreas Zinnen, Martin Henze, and Klaus Wehrle. Website fingerprinting at internet scale. InNDSS, 2016
work page 2016
-
[27]
Mohammad Saidur Rahman, Payap Sirinam, Nate Math- ews, Kantha Girish Gangadhara, and Matthew Wright. Tik-tok: The utility of packet timing in website finger- printing attacks.Proceedings on Privacy Enhancing Technologies, 2020(3):5–24, July 2020
work page 2020
-
[28]
An overview of multi-task learning in deep neural networks, 2017
Sebastian Ruder. An overview of multi-task learning in deep neural networks, 2017
work page 2017
-
[29]
Pascal J. Sager, Benjamin Meyer, Peng Yan, Rebekka von Wartburg-Kottler, Layan Etaiwi, Aref Enayati, Gabriel Nobel, Ahmed Abdulkadir, Benjamin F. Grewe, and Thilo Stadelmann. A comprehensive survey of agents for computer use: Foundations, challenges, and future directions, 2025
work page 2025
-
[30]
Deep fingerprinting: Undermin- ing website fingerprinting defenses with deep learning, 2018
Payap Sirinam, Mohsen Imani, Marc Juarez, and Matthew Wright. Deep fingerprinting: Undermin- ing website fingerprinting defenses with deep learning, 2018
work page 2018
-
[31]
Triplet fingerprinting: More practical and portable website fingerprinting with n-shot learning
Payap Sirinam, Nate Mathews, Mohammad Saidur Rah- man, and Matthew Wright. Triplet fingerprinting: More practical and portable website fingerprinting with n-shot learning. InProceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, CCS ’19, pages 1131–1148, New York, NY , USA, 2019. Association for Computing Machinery
work page 2019
-
[32]
QCSD: A QUIC Client-Side Website- Fingerprinting defence framework
Jean-Pierre Smith, Luca Dolfi, Prateek Mittal, and Adrian Perrig. QCSD: A QUIC Client-Side Website- Fingerprinting defence framework. In31st USENIX Security Symposium (USENIX Security 22), pages 771– 789, Boston, MA, August 2022. USENIX Association
work page 2022
-
[33]
Seamless website fingerprinting in multiple environ- ments, 2024
Chuxu Song, Zining Fan, Hao Wang, and Richard Mar- tin. Seamless website fingerprinting in multiple environ- ments, 2024
work page 2024
-
[34]
Redefining website finger- printing attacks with multiagent llms, 2025
Chuxu Song, Dheekshith Dev Manohar Mekala, Hao Wang, and Richard Martin. Redefining website finger- printing attacks with multiagent llms, 2025
work page 2025
-
[35]
Raptor: Routing attacks on privacy in tor
Yixin Sun, Anne Edmundson, Laurent Vanbever, and Nick Feamster. Raptor: Routing attacks on privacy in tor. InUSENIX Security Symposium, pages 271–286, 2015
work page 2015
-
[36]
Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang, Zhiyuan Chen, Jiakai Tang, Xu Chen, Yankai Lin, Wayne Xin Zhao, Zhewei Wei, and Jirong Wen. A survey on large language model based autonomous agents.Frontiers of Computer Sci- ence, 18(6), March 2024
work page 2024
-
[37]
Improved website fin- gerprinting on tor
Tao Wang and Ian Goldberg. Improved website fin- gerprinting on tor. InProceedings of the 12th ACM Workshop on Workshop on Privacy in the Electronic So- ciety, WPES ’13, pages 201–212, New York, NY , USA,
-
[38]
Association for Computing Machinery
-
[39]
Walkie-Talkie: An efficient defense against passive website fingerprinting attacks
Tao Wang and Ian Goldberg. Walkie-Talkie: An efficient defense against passive website fingerprinting attacks. In26th USENIX Security Symposium (USENIX Security 17), pages 1375–1390, Vancouver, BC, August 2017. USENIX Association
work page 2017
-
[40]
Xinyu Zhang, Qiang Wang, Jian Zhang, and Zhao Zhong. Adversarial autoaugment, 2019
work page 2019
-
[41]
Towards fine-grained web- page fingerprinting at scale
Xiyuan Zhao, Xinhao Deng, Qi Li, Yunpeng Liu, Zhuo- tao Liu, Kun Sun, and Ke Xu. Towards fine-grained web- page fingerprinting at scale. InProceedings of the 2024 on ACM SIGSAC Conference on Computer and Commu- nications Security, CCS ’24, pages 423–436, New York, NY , USA, 2024. Association for Computing Machinery
work page 2024
-
[42]
Shuyan Zhou, Frank F. Xu, Hao Zhu, Xuhui Zhou, Robert Lo, Abishek Sridhar, Xianyi Cheng, Tianyue Ou, Yonatan Bisk, Daniel Fried, Uri Alon, and Graham Neu- big. Webarena: A realistic web environment for building autonomous agents, 2024. 14
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.