arxiv: 2605.11606 · v1 · submitted 2026-05-12 · 💻 cs.CR · cs.NI

Recognition: no theorem link

Convolutional-Neural-Networks for Deanonymisation of I2P Traffic

Dieter Arnold, Konrad Baechler, Luca Rohrer

Pith reviewed 2026-05-13 01:28 UTC · model grok-4.3

classification 💻 cs.CR cs.NI

keywords I2P networkdeanonymizationconvolutional neural networkstraffic analysisanonymitymix networkspassive attacksmachine learning

0 comments

The pith

Machine learning models fail to deanonymize I2P services from encrypted traffic patterns.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether convolutional neural networks can identify specific services or users inside the I2P anonymity network by examining only the observable flow of encrypted packets. Researchers first built a controlled laboratory setup that generates synthetic I2P traffic to create labeled training data, then trained CNN models on timing, size, and direction features of those flows. They supplemented the experiments with a theoretical analysis that applies Fano's inequality to bound how much identifying information can leak through any mix network of this type. When the same models were run on both the synthetic data and separate real-world I2P traces, classification accuracy stayed too low to achieve reliable deanonymization. The outcome indicates that I2P's existing protections continue to hold against this class of passive machine-learning attack.

Core claim

The authors show that convolutional neural networks applied to passive I2P traffic metadata cannot reliably map flows back to individual services or destinations. A laboratory environment supplied synthetic traffic for model training, while Fano's inequality supplied a theoretical upper bound on the information that can be extracted from anonymous mix-network transmissions. Real-world validation runs confirmed that the trained models produced no practically useful identifications, leaving the network's anonymity guarantees intact.

What carries the argument

Convolutional neural networks trained on packet timing, size, and direction sequences from I2P connections, together with Fano's inequality applied to quantify the maximum information leakage possible in mix networks.

If this is right

I2P users retain protection against passive traffic-analysis attacks that rely on current deep-learning classifiers.
Other anonymity systems built on similar mixing principles are likely to resist the same style of CNN-based identification.
Security assessments of anonymous networks should routinely include CNN experiments on synthetic traffic to check for hidden patterns.
Fano's inequality offers a reusable theoretical tool for placing quantitative limits on leakage in any mix-network design.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Designers of future anonymity overlays may need to monitor advances in traffic-classification models and adjust mixing parameters accordingly.
Similar CNN resistance tests could be applied to Tor or other low-latency anonymity systems to compare their relative robustness.
Real deployments would benefit from ongoing collection of fresh traffic traces to keep synthetic training distributions aligned with live usage.

Load-bearing premise

The synthetic traffic created in the laboratory captures the statistical features that real I2P users produce, and the CNNs are able to detect any distinguishing signals that encryption has left exposed.

What would settle it

A demonstration that the same CNN architecture, trained on fresh synthetic data, achieves high-accuracy identification of specific hidden services when tested on independent real-world I2P traffic traces would falsify the result.

Figures

Figures reproduced from arXiv: 2605.11606 by Dieter Arnold, Konrad Baechler, Luca Rohrer.

**Figure 2.** Figure 2: Schema of Inbound Tunnel creation and the process of anonymous writing to the network database using an [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Experimental framework. A. Laboratory Setting The laboratory environment is isolated from the real, public network and used for generating synthetic network traffic. Multiple I2P nodes, each within its own Docker container, run on a single host [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗

**Figure 4.** Figure 4: Timing diagram from the sender 10.8.0.2 to the first node of the receiver’s inbound tunnel 10.8.0.11. [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗

**Figure 5.** Figure 5: Visualization of I2P network traffic based on all recorded Wireshark data. [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗

**Figure 6.** Figure 6: Outgoing tunnels from the sender [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗

**Figure 7.** Figure 7: Number of packets per port for TCP and UDP to node 10.8.0.11. [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗

**Figure 8.** Figure 8: Top 15 TCP and UDP packet sizes to node 10.8.0.11 [PITH_FULL_IMAGE:figures/full_fig_p014_8.png] view at source ↗

**Figure 9.** Figure 9: Entropy of the payload of selected packets. [PITH_FULL_IMAGE:figures/full_fig_p014_9.png] view at source ↗

**Figure 10.** Figure 10: Estimated number of I2P nodes over time worldwide. [PITH_FULL_IMAGE:figures/full_fig_p015_10.png] view at source ↗

**Figure 11.** Figure 11: shows how one of the few reseed servers in the I2P network receives requests from I2P nodes over time. A clear 24-hour fluctuation pattern is visible, likely caused by I2P nodes in the U.S. time zone regularly leaving and rejoining the network [PITH_FULL_IMAGE:figures/full_fig_p015_11.png] view at source ↗

**Figure 12.** Figure 12: Elbow Method for Determining the Optimal Number of Clusters [PITH_FULL_IMAGE:figures/full_fig_p018_12.png] view at source ↗

**Figure 13.** Figure 13: Architecture of the CNN 26 [PITH_FULL_IMAGE:figures/full_fig_p026_13.png] view at source ↗

read the original abstract

This study investigates the potential for deanonymizing services within the Invisible Internet Project (I2P) network through passive traffic analysis and machine learning techniques. The primary objective is to identify distinctive patterns in I2P traffic despite the encryption of its payload. To achieve this, a controlled laboratory environment was established to generate synthetic I2P traffic, providing a training dataset for machine learning models. Furthermore, Fano's inequality is employed to perform a theoretical analysis of anonymous data transmission in mix networks such as I2P, thereby supporting a data-driven approach to uncover causal relationships. In computer experiments, advanced deep learning methods - particularly Convolutional Neural Networks - are applied within the laboratory I2P network, and their effectiveness is further evaluated using real-world traffic data. The results indicate that the proposed methodologies do not compromise the anonymity guarantees of the I2P network.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper applies CNNs to I2P traffic and reports a negative deanonymization result, but the real-world ground truth is shaky.

read the letter

The main thing here is a negative result: CNN-based traffic analysis on both lab-generated and real I2P flows does not break the anonymity guarantees. That lines up with what mix networks are built to do, and the authors back it with a lab setup plus a nod to Fano's inequality for the theoretical side. The lab work looks like the stronger part, since they control the traffic generation and can train models on known patterns before testing. The theoretical reference gives the empirical claim a bit more grounding than pure fitting would have. The real-world evaluation is where it gets thin. Getting reliable labels for service identity or endpoints in live I2P traffic is difficult without either injecting flows or relying on external knowledge the system is meant to hide. If those labels are synthetic or proxy-based rather than independently verified, the accuracy numbers cannot be read as strong evidence that the attack fails in practice. The abstract also skips concrete metrics, dataset sizes, and error bars, which makes it hard to gauge how decisive the no-compromise claim actually is. This is useful reading for people who work on I2P specifically or on ML attacks against anonymity networks more broadly. It adds an I2P data point to the existing literature on Tor and similar systems, even if the methods themselves are not new. I would send it to peer review. A clean negative result with better-documented labeling and numbers would be worth having in the record, and the current version is close enough that referees could sort out the gaps without starting from scratch.

Referee Report

3 major / 3 minor

Summary. The manuscript investigates deanonymization of I2P services via passive traffic analysis using convolutional neural networks. It generates synthetic I2P traffic in a controlled lab for model training, invokes Fano's inequality for a theoretical bound on anonymity in mix networks, applies CNNs to both synthetic and real-world I2P traffic, and concludes that the proposed methods do not compromise I2P anonymity guarantees.

Significance. If the central empirical claim holds after addressing ground-truth issues, the work would supply concrete evidence that I2P traffic remains resistant to CNN-based passive attacks, even when models are trained on synthetic data and tested on live captures. This would be a useful data point for the anonymous-communication literature, particularly as deep-learning traffic analysis becomes more common.

major comments (3)

[Experiments section] Real-world evaluation (Experiments section): The claim that CNN performance on real-world I2P traffic demonstrates no effective deanonymization assumes access to independently verifiable ground-truth labels (service identity, origin, or destination). Obtaining such labels in a live anonymous network without active injection or external side-channel knowledge contradicts the anonymity properties under test; if labels are synthetic, self-generated, or proxy-based, the measured accuracy cannot be interpreted as evidence that the attack fails in practice.
[§3] Theoretical analysis (Abstract and §3): Fano's inequality is invoked to support a data-driven approach to anonymity, yet no specific derivation, application to I2P traffic features, or equation linking the bound to the CNN feature space is provided. Without this, it is unclear whether the inequality supplies an independent limit or merely restates the empirical observation in general terms.
[Methodology and Experiments sections] Synthetic-to-real generalization (Methodology and Experiments sections): The central no-compromise conclusion rests on models trained exclusively on laboratory synthetic traffic being evaluated on real-world captures. No quantitative validation (e.g., statistical comparison of flow statistics, packet-size distributions, or timing features) is reported to confirm that the synthetic data distribution matches live I2P usage, leaving open the possibility that the models simply fail to capture real distinguishing features.

minor comments (3)

[Abstract] Abstract: No numerical results (accuracy, precision, recall, dataset sizes, or error bars) are stated, preventing even a preliminary assessment of the strength of the 'no compromise' conclusion.
The manuscript should supply the CNN architecture diagram, layer dimensions, hyperparameter choices, and training/validation split details to support reproducibility.
A dedicated limitations subsection discussing the gap between lab-generated and live I2P traffic would strengthen the presentation.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below with point-by-point responses and indicate where revisions will be made to improve clarity and rigor.

read point-by-point responses

Referee: [Experiments section] Real-world evaluation (Experiments section): The claim that CNN performance on real-world I2P traffic demonstrates no effective deanonymization assumes access to independently verifiable ground-truth labels (service identity, origin, or destination). Obtaining such labels in a live anonymous network without active injection or external side-channel knowledge contradicts the anonymity properties under test; if labels are synthetic, self-generated, or proxy-based, the measured accuracy cannot be interpreted as evidence that the attack fails in practice.

Authors: We agree that verifiable ground-truth labels are critical for valid interpretation. In our real-world evaluation, the traffic was generated by services that we operated and controlled within the live I2P network; this allowed direct, self-verifiable labeling of flows by service identity without requiring side-channel information from other users. The captured traffic is genuine I2P traffic subject to the network's mixing and encryption. We will revise the Experiments section to explicitly describe this controlled-operator setup and discuss its scope and limitations relative to fully blind real-world attacks. revision: yes
Referee: [§3] Theoretical analysis (Abstract and §3): Fano's inequality is invoked to support a data-driven approach to anonymity, yet no specific derivation, application to I2P traffic features, or equation linking the bound to the CNN feature space is provided. Without this, it is unclear whether the inequality supplies an independent limit or merely restates the empirical observation in general terms.

Authors: We accept that the current presentation of the theoretical analysis lacks sufficient detail. In the revised manuscript we will expand §3 with an explicit derivation of Fano's inequality applied to the anonymity setting of mix networks, including the relevant equations that relate the CNN classification error probability to the conditional entropy of service identity given the observed traffic features. This will clarify the connection between the theoretical bound and the empirical CNN results. revision: yes
Referee: [Methodology and Experiments sections] Synthetic-to-real generalization (Methodology and Experiments sections): The central no-compromise conclusion rests on models trained exclusively on laboratory synthetic traffic being evaluated on real-world captures. No quantitative validation (e.g., statistical comparison of flow statistics, packet-size distributions, or timing features) is reported to confirm that the synthetic data distribution matches live I2P usage, leaving open the possibility that the models simply fail to capture real distinguishing features.

Authors: We acknowledge the absence of quantitative distribution matching in the current version. We will add a dedicated subsection (or appendix) that reports statistical comparisons between the synthetic and real-world datasets, including packet-size histograms, inter-arrival time distributions, flow duration statistics, and results from appropriate tests such as the Kolmogorov-Smirnov test. These additions will either support the generalization claim or allow us to qualify the conclusions accordingly. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical CNN evaluation and Fano bound are independent of inputs

full rationale

The paper trains CNN classifiers on laboratory-generated synthetic I2P flows and reports performance on separate real-world captures, then invokes Fano's inequality as an external information-theoretic bound on mix-network anonymity. No equation, parameter fit, or self-citation is shown to redefine the target quantity (deanonymization success) in terms of the training labels or model outputs. The central claim therefore does not reduce to its own inputs by construction and remains falsifiable against external traffic traces.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that lab-generated traffic matches real I2P behavior and that standard ML generalization holds; Fano's inequality is treated as a standard information-theoretic bound.

axioms (1)

standard math Fano's inequality provides a valid upper bound on information leakage in mix networks such as I2P
Invoked to support theoretical analysis of anonymous data transmission.

pith-pipeline@v0.9.0 · 5444 in / 1021 out tokens · 42572 ms · 2026-05-13T01:28:59.898820+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

22 extracted references · 22 canonical work pages

[1]

I2P, (w.y), A Gentle Introduction How I2P Works (o. J.). last retrieval July 12, 2025,https://geti2p.net/en/ docs/how/intro

work page 2025
[2]

Popper, K. R. (1959). The Logic of Scientific Discovery. London: Hutchinson

work page 1959
[3]

Egger, C., Schlumberger, J., Kruegel, C., Vigna, G. (2013). Practical Attacks Against The I2P Network (S

work page 2013
[4]

Friedrich-Alexander University Erlangen–Nuremberg und University of California, Santa Barbara.https://sites.cs.ucsb.edu/ ˜chris/research/doc/raid13_i2p.pdf

[Technischer Report]. Friedrich-Alexander University Erlangen–Nuremberg und University of California, Santa Barbara.https://sites.cs.ucsb.edu/ ˜chris/research/doc/raid13_i2p.pdf

work page
[5]

I2P, ”What do we mean by ”anonymous”?”,https://geti2p.net/en/docs/how/threat-model

work page
[6]

Shahbar, K., Zincir-Heywood, A. N. (2017, Mai). Effects of Shared Bandwidth on Anonymity of the I2P Network Users.https://doi.org/10.1109/SPW.2017.19

work page doi:10.1109/spw.2017.19 2017
[7]

Anonymity Services Tor, I2P, JonDonym: Classifying in the Dark (Web)

Antonio Montieri, Domenico Ciuonzo, Giuseppe Aceto, Antonio Pescap ´e. (2018). “Anonymity Services Tor, I2P, JonDonym: Classifying in the Dark (Web).” IEEE Transactions on Dependable and Secure Computing, 1–14

work page 2018
[8]

Lotfollahi, Mohammad and Shirali Hossein Zade, Ramin and Jafari Siavoshani, Mahdi and Saberian, Mohammad, ”Deep Packet: A novel approach for encrypted traffic classification using deep learning”,https://arxiv.org/ abs/1709.02656, 2019

work page arXiv 2019
[9]

Zhao, Qing and Wang, Lei, ”Research on Network Traffic Protocol Classification Based on CNN-LSTM Model”, Journal of Telecommunications and Information Engineering, vol. 12, no. 3, pages 45-56, 2025,https://www. sci-open.net/index.php/JTIE/article/view/2325

work page 2025
[10]

David L. Chaum. 1981. Untraceable electronic mail, return addresses, and digital pseudonyms. Commun. ACM 24, 2 (Feb. 1981), 84–90.https://doi.org/10.1145/358549.358563

work page doi:10.1145/358549.358563 1981
[11]

G., Syverson P

Reed M. G., Syverson P. F., Goldschlag D. M. (1998) ”Anonymous connections and onion routing”, IEEE Journal on Selected Areas in Communications, 16(4):482–494

work page 1998
[12]

Invisible Internet Network Protocol (I2NP), Revision 0.9, 28 August, 2003, last retrieval July 12, 2025http:// geti2p.org/_static/pdf/I2NP_spec.pdf

work page 2003
[13]

Calibrating noise to sensitivity in private data analysis.Theory of Cryptography Conference

Dwork, Cynthia and McSherry, Frank and Nissim, Kobbi and Smith, Adam. Calibrating noise to sensitivity in private data analysis.Theory of Cryptography Conference. pages 265-284, Springer, 2006

work page 2006
[14]

Feng, Tianyi and Zhang, Zhixiang and Wong, Wai-Choong and Sun, Sumei and Sikdar, Biplab. (2024). ”A Framework for Tradeoff Between Location Privacy Preservation and Quality of Experience in Location Based Services”, pp. 428- 439,https://doi.org/10.1109/OJVT.2024.3364184

work page doi:10.1109/ojvt.2024.3364184 2024
[15]

Pfitzmann, A., Hansen, M. (2010). A Terminology for Talking about Privacy by Data Minimization: Anonymity, Unlinkability, Undetectability, Unobservability, Pseudonymity, and Identity Management (No. v0.34). TU Dresden and ULD Kiel,http://dud.inf.tu-dresden.de/literatur/Anon_Terminology_v0.34.pdf

work page 2010
[16]

Cai, Xiang and Zhang, Xin Cheng and Joshi, Brijesh and Johnson, Rob. (2012). ”Touching from a distance: website fingerprinting attacks and defenses”, Association for Computing Machinery, pp.605–616,https://doi.org/10. 1145/2382196.2382260

work page arXiv 2012
[17]

Transmission of Information

Fano, RM. “Transmission of Information”, the M.I.T. Press and John Wiley and Sons, New York & London, 1961

work page 1961
[18]

Monitoring the I2P network

Juan Pablo Timpanaro, Chrisment Isabelle, Festor Olivier. Monitoring the I2P network. [Research Report] RR-7844, INRIA. 2011.〈hal-00653136〉https://inria.hal.science/hal-00653136

work page 2011
[19]

Monitoring an anonymity network: Toward the deanonymization of hidden services,

Marco Simioni, Pavel Gladyshev, Babak Habibnia, Paulo Roberto Nunes de Souza, (2021) “Monitoring an anonymity network: Toward the deanonymization of hidden services,” Forensic Science International: Digital Investigation, V olume 38, Supplement, ISSN 2666-2817,https://doi.org/10.1016/j.fsidi.2021.301135. 28

work page doi:10.1016/j.fsidi.2021.301135 2021
[20]

Conceptual Understanding of Convolutional Neural Network – A Deep Learning Approach

Sakshi Indolia, Anil Kumar Goswami, S. P. Mishra, Pooja Asopa. (2018), “Conceptual Understanding of Convolutional Neural Network – A Deep Learning Approach”. Procedia Computer Science, 132, 679–688,https://doi.org/ 10.1016/j.procs.2018.05.069

work page doi:10.1016/j.procs.2018.05.069 2018
[21]

E., Gunawan, V

Umargono, E., Suseno, J. E., Gunawan, V . (2020). K-Means Clustering Optimization Using the Elbow Method and Early Centroid Determination Based on Mean and Median Formula. Advances in Social Science, Education and Humanities Research, 474, 121–129

work page 2020
[22]

Zhao, J., Jing, X., Yan, Z., Pedrycz, W. (2021). Network traffic classification for data fusion: A survey. Information Fusion, 72, 22–47.https://doi.org/10.1016/j.inffus.2021.02.009 29

work page doi:10.1016/j.inffus.2021.02.009 2021