pith. machine review for the scientific record. sign in

arxiv: 2605.02795 · v1 · submitted 2026-05-04 · 💻 cs.CR · cs.NI

Recognition: 2 theorem links

Analyzing Unsolicited Internet Traffic: Measuring IoT Security Threats via Network Telescopes

Authors on Pith no claims yet

Pith reviewed 2026-05-08 18:32 UTC · model grok-4.3

classification 💻 cs.CR cs.NI
keywords network telescopeIoT securityunsolicited trafficscanning patternsTelnetentropy analysiscyber threatsreconnaissance
0
0 comments X

The pith

A tiny fraction of IP addresses drives most unsolicited IoT scanning traffic.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Network telescopes passively collect unsolicited internet packets to reveal global scanning patterns. This study processes a ten-day dataset of roughly 22 million packets and shows that the top one percent of source addresses produce more than eighty-one percent of the observed volume. It also documents heavy concentration on Telnet ports 23 and 2323 along with synchronized spikes in packet count and Shannon entropy that point to coordinated reconnaissance against legacy IoT devices. These results matter because they demonstrate a lightweight, privacy-preserving method for spotting large-scale threat activity without inspecting packet contents.

Core claim

Analysis of traffic captured by the ORION network telescope during January 2025 reveals a highly structured and centralized ecosystem in which the top 1 percent of source IP addresses generate over 81 percent of total packets, with dominant activity on ports 23 and 2323 and synchronized entropy surges that indicate coordinated multi-vector IoT reconnaissance campaigns.

What carries the argument

Privacy-preserving metadata analysis paired with lightweight behavioral heuristics that detect scanning and backscatter patterns plus synchronized entropy surges without payload inspection.

If this is right

  • Blocking or rate-limiting the small set of dominant source addresses could reduce the bulk of observed scanning volume.
  • The continued dominance of Telnet ports shows that weak-credential attacks on legacy IoT devices remain widespread.
  • Entropy-based signals offer a practical marker for identifying coordinated, multi-vector reconnaissance in near real time.
  • The same lightweight analysis can support educational datasets and operational threat monitoring without requiring deep packet inspection.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same metadata-only approach could be adapted to live network sensors for early warning of emerging botnet activity.
  • Similar centralization patterns may appear when the method is applied to traffic from other geographic or institutional telescopes.
  • Device manufacturers could prioritize default-credential changes on ports 23 and 2323 as a high-impact mitigation step.
  • Longer observation windows might reveal whether the top sources rotate or remain stable over months.

Load-bearing premise

Metadata and simple behavioral rules alone can reliably separate coordinated scanning campaigns from ordinary background traffic.

What would settle it

A new telescope dataset in which the top 1 percent of sources no longer produce over 81 percent of traffic or in which entropy surges fail to align with known attack events would undermine the central claims.

Figures

Figures reproduced from arXiv: 2605.02795 by Asma Jodeiri Akbarfam, Garrett Gastman, Raul Martinez, Shereen Ismail, Taelyn Dyer, Yozelyn Chavez.

Figure 1
Figure 1. Figure 1: Taxonomy of darknet research, categorized by col view at source ↗
Figure 3
Figure 3. Figure 3: Top 10 countries contributing unsolicited traffic. view at source ↗
Figure 2
Figure 2. Figure 2: Analysis workflow for the darknet dataset including view at source ↗
Figure 4
Figure 4. Figure 4: Hourly traffic volume from the top five ASNs. view at source ↗
Figure 6
Figure 6. Figure 6: Comparison of packet volume across high-risk and view at source ↗
Figure 5
Figure 5. Figure 5: Volume versus burstiness classification for the top view at source ↗
Figure 7
Figure 7. Figure 7: Lorenz curve illustrating traffic concentration across view at source ↗
Figure 9
Figure 9. Figure 9: Normalized entropy measures (0–1 scale) aligned view at source ↗
read the original abstract

Network telescopes serve as a critical passive monitoring tool for capturing unsolicited Internet traffic, providing insights into global scanning and reconnaissance behavior. This study analyzes a 10-day dataset during January 2025 consisting of approximately 22 million packets collected by the ORION network telescope at Merit Network. By employing privacy-preserving metadata analysis and lightweight behavioral heuristics, we identify scanning and backscatter patterns without payload inspection. Our results reveal a highly structured and centralized ecosystem, where the top 1% of source IP addresses generate over 81% of total traffic. A significant finding is the dominance of Port 23 (Telnet) and Port 2323 (Telnet Alt), which highlights the persistent nature of IoT security threats and widespread attempts to exploit weak credentials in legacy IoT devices. Furthermore, synchronized surges in packet volume and Shannon entropy indicate coordinated, multi-vector reconnaissance campaigns. These findings offer a practical framework for identifying large-scale threat activity and support cybersecurity research and education.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript analyzes approximately 22 million packets of unsolicited traffic captured over 10 days in January 2025 by the ORION network telescope. Using privacy-preserving metadata analysis and lightweight behavioral heuristics without payload inspection, it identifies scanning and backscatter patterns. Key results include a highly centralized traffic distribution (top 1% of source IPs generating >81% of traffic), dominance of Telnet ports 23 and 2323, and interpretation of synchronized packet-volume and Shannon-entropy surges as evidence of coordinated multi-vector IoT reconnaissance campaigns. The work proposes a practical framework for large-scale threat identification.

Significance. If the heuristics prove robust and the coordination interpretation is validated, the observational dataset and metadata-only approach would provide useful empirical evidence on the structure of IoT scanning ecosystems and a scalable monitoring method. The privacy-preserving design and focus on legacy IoT ports are strengths. However, the absence of ground-truth validation or ablation studies currently limits the reliability of the central threat-attribution claims.

major comments (2)
  1. Abstract and results: The claim that synchronized surges in packet volume and Shannon entropy indicate 'coordinated, multi-vector reconnaissance campaigns' is load-bearing for the paper's contribution on threat identification, yet it rests on lightweight behavioral heuristics applied to metadata alone. No details are given on exact heuristic definitions, parameter sensitivity, comparison against known botnet timelines, or ablation versus simpler volume thresholds; alternative explanations (diurnal effects, single-source rate variations, backscatter) are not addressed. This leaves the coordination inference under-supported.
  2. Methodology: The manuscript states that scanning and backscatter patterns are distinguished without payload inspection, but provides no quantitative description of the heuristics, validation against ground truth, error bars, or robustness checks. Because these distinctions underpin both the port-dominance findings and the entropy-volume correlation, the lack of methodological transparency weakens the central claims.
minor comments (2)
  1. Abstract: The data-collection period and volume are stated, but the manuscript would benefit from explicit mention of the telescope's vantage point characteristics and any filtering applied before the 22-million-packet count.
  2. Presentation: Figures or tables reporting the top-1% traffic share and port distributions should include confidence intervals or sensitivity ranges to allow readers to assess stability of the 81% figure.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive and detailed feedback. We agree that the manuscript requires greater methodological transparency and explicit discussion of alternative explanations to strengthen the central claims. We address each major comment below and will revise the paper accordingly.

read point-by-point responses
  1. Referee: Abstract and results: The claim that synchronized surges in packet volume and Shannon entropy indicate 'coordinated, multi-vector reconnaissance campaigns' is load-bearing for the paper's contribution on threat identification, yet it rests on lightweight behavioral heuristics applied to metadata alone. No details are given on exact heuristic definitions, parameter sensitivity, comparison against known botnet timelines, or ablation versus simpler volume thresholds; alternative explanations (diurnal effects, single-source rate variations, backscatter) are not addressed. This leaves the coordination inference under-supported.

    Authors: We acknowledge that the coordination inference requires stronger support through explicit methodology. In the revised manuscript we will add a new subsection detailing the exact heuristic: synchronized surges are flagged when both packet volume and Shannon entropy (computed over 5-minute bins on destination-port distributions) exceed a z-score threshold of 2.0 relative to a 24-hour rolling baseline. We will include a parameter-sensitivity table showing how results change across thresholds (1.5–3.0) and bin sizes (1–15 min). Alternative explanations will be discussed, including diurnal cycles (ruled out by the multi-port, multi-day persistence) and backscatter (inconsistent with the observed source-IP concentration and entropy rise). Direct timeline matching to specific botnets is limited by the anonymized telescope data; we will reference Mirai-era reports as contextual support and rephrase the claim as 'suggestive of coordinated activity' rather than definitive. These additions will appear in both the Methodology and Results sections. revision: yes

  2. Referee: Methodology: The manuscript states that scanning and backscatter patterns are distinguished without payload inspection, but provides no quantitative description of the heuristics, validation against ground truth, error bars, or robustness checks. Because these distinctions underpin both the port-dominance findings and the entropy-volume correlation, the lack of methodological transparency weakens the central claims.

    Authors: We agree that quantitative detail is currently insufficient. The revised Methodology section will specify the classification rules: scanning is identified by high source-IP diversity (>100 distinct sources per port in a bin) combined with low per-source entropy (<1.5 bits), while backscatter is flagged by concentrated destination-IP responses with elevated TCP flags. We will provide the Shannon-entropy formula used, pseudocode for the surge detector, and bootstrap-derived error bars on the top-1% traffic share. Robustness will be demonstrated by repeating the analysis with varied window sizes and noise-injection tests. Full ground-truth validation is not feasible with passive metadata-only telescope data; we will state this limitation explicitly and instead cross-validate against synthetic scanning traces and published IoT-botnet traffic signatures. These changes will be incorporated as a new 'Heuristic Definitions and Validation' subsection. revision: partial

standing simulated objections not resolved
  • Ground-truth validation of the coordination and scanning/backscatter heuristics, which cannot be obtained from this passive, metadata-only observational dataset without additional active or payload-bearing data sources.

Circularity Check

0 steps flagged

No significant circularity in observational traffic analysis

full rationale

The paper reports direct empirical measurements from a fixed 10-day ORION telescope dataset using metadata-only heuristics. Key results (top 1% IPs generating 81% traffic, Telnet port dominance, volume-entropy surges) are statistical summaries of observed packets, not outputs of equations, fitted parameters, or models that reduce to inputs by construction. No derivations, self-citations, uniqueness theorems, or ansatzes appear in the provided text. The coordination interpretation is post-hoc labeling of patterns, not a load-bearing derivation. This is self-contained data analysis against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on the assumption that network telescopes capture representative unsolicited traffic and that simple metadata heuristics plus entropy can identify coordinated campaigns without payload data. No free parameters or invented entities are explicitly introduced in the abstract.

axioms (2)
  • domain assumption Network telescopes passively capture unsolicited internet traffic that reflects global scanning and reconnaissance behavior.
    Invoked as the foundation for data collection and interpretation in the abstract.
  • domain assumption Behavioral heuristics applied to metadata alone can distinguish scanning patterns and backscatter without false positives from other traffic.
    Required for the identification of IoT threats and coordinated campaigns.

pith-pipeline@v0.9.0 · 5483 in / 1477 out tokens · 80626 ms · 2026-05-08T18:32:56.397162+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Characterizing AI-Assisted Bot Traffic in Darknet Data: Implications for ICS and IIoT Security

    cs.CR 2026-05 unverdicted novelty 5.0

    Darknet analysis shows ICS bot traffic doubling from 0.82% to 1.51% over four years, with micro-pacing enabling 97.47% evasion of standard volumetric IDS thresholds.

Reference graph

Works this paper leans on

16 extracted references · 2 canonical work pages · cited by 1 Pith paper

  1. [1]

    A lightweight machine learning approach for anomalous unsolicited network traffic detection by observing network telescopes,

    S. Ismail, S. Dandan, and M. King, “ A lightweight machine learning approach for anomalous unsolicited network traffic detection by observing network telescopes,” in2025 IEEE 15th Annual Computing and Communication Workshop and Conference (CCWC), 2025, pp. 00 407–00 413

  2. [2]

    Merit network telescope: Processing and initial insights from nearly 20 years of darknet traffic for cybersecurity research,

    S. Ismail, E. Hammad, W . Hatcher, S. Dandan, A. Alomari, and M. Spratt, “Merit network telescope: Processing and initial insights from nearly 20 years of darknet traffic for cybersecurity research,” inProceedings of the IEEE UEMCON 2025 Conference, October 23–24 2025, accepted

  3. [3]

    Less is more? exploring the impact of scaled-down network telescopes on security and research,

    A. V . Camargo, L. M. Bertholdo, and L. Z. Granville, “Less is more? exploring the impact of scaled-down network telescopes on security and research,” inSimpósio Brasileiro de Redes de Computadores e Sistemas Distribuídos (SBRC). SBC, 2024, pp. 1050–1063

  4. [4]

    Lessons learned from operating a large network telescope,

    A. Männel, J. Mücke, K. Claffy, M. Gao, R. K. Mok, M. Nawrocki, T . C. Schmidt, and M. Wählisch, “Lessons learned from operating a large network telescope,” inProceedings of the ACM SIGCOMM 2025 Conference, 2025, pp. 826–841

  5. [5]

    Have you syn me? characterizing ten years of internet scanning,

    H. Griffioen, G. Koursiounis, G. Smaragdakis, and C. Doerr, “Have you syn me? characterizing ten years of internet scanning,” inProceedings of the 2024 ACM on Internet Measurement Conference, 2024, pp. 149– 164

  6. [6]

    ORION: Observatory for cyber-risk insights and outages of networks,

    “ORION: Observatory for cyber-risk insights and outages of networks,” accessed: September 23, 2024. [Online]. Available: https://www.merit.edu/research/projects/orion-network-telescope/

  7. [7]

    Characterizing internet background traffic from a spain-based net- work telescope,

    R. García-Peñas, R. A. Rodríguez-Gómez, and G. Maciá-Fernández, “Characterizing internet background traffic from a spain-based net- work telescope,”Computers & Security, vol. 159, p. 104693, 2025

  8. [8]

    A comparative study of packet capture tools for reliable network telescope traffic collection,

    S. Ismail, E. Hammad, S. Dandan, W . Hatcher, and A. Alomari, “ A comparative study of packet capture tools for reliable network telescope traffic collection,” inProceedings of the IEEE UEMCON 2025 Conference, October 23–24 2025, accepted

  9. [9]

    i-darkvec: Incremental embeddings for darknet traffic analysis,

    L. Gioacchini, L. Vassio, M. Mellia, I. Drago, Z. B. Houidi, and D. Rossi, “i-darkvec: Incremental embeddings for darknet traffic analysis,”ACM Transactions on Internet Technology, vol. 23, no. 3, pp. 1–28, 2023

  10. [10]

    Identifying and differentiat- ing acknowledged scanners in network traffic,

    M. P . Collins, A. Hussain, and S. Schwab, “Identifying and differentiat- ing acknowledged scanners in network traffic,” in2023 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW). IEEE, 2023, pp. 567–574

  11. [11]

    Characterizing large-scale adversarial activities through large-scale honey-nets,

    T . Haikal, E. Hammad, and S. Ismail, “Characterizing large-scale adversarial activities through large-scale honey-nets,” inProceedings of the IEEE UEMCON 2025 Conference, October 23–24 2025, accepted

  12. [12]

    A search engine backed by internet-wide scanning,

    Z. Durumeric, D. Adrian, A. Mirian, M. Bailey, and J. A. Halderman, “ A search engine backed by internet-wide scanning,” inProceedings of the 22nd ACM SIGSAC conference on computer and communications security, 2015, pp. 542–553

  13. [13]

    Darknet threats and detection strategies: A concise overview,

    M. J. Obaidat, I. A. Al-Syouf, Y. F . Awawdeh, A. E. Masa’ deh, and Q. A. Al-Haija, “Darknet threats and detection strategies: A concise overview,” in2025 16th International Conference on Information and Communication Systems (ICICS). IEEE, 2025, pp. 1–6

  14. [14]

    Detecting and Interpreting Changes in Scanning Behavior in Large Network Telescopes,

    M. Kallitsis, R. Prajapati, V . Honavar, D. Wu, and J. Yen, “Detecting and Interpreting Changes in Scanning Behavior in Large Network Telescopes,”IEEE Transactions on Information Forensics and Security, vol. 17, pp. 3611–3625, 2022. [Online]. Available: http://dx.doi.org/10.1109/TIFS.2022.3211644

  15. [15]

    Threat Intelligence Generation Using Network Telescope Data for Industrial Control Systems,

    O. Cabana, A. M. Youssef, M. Debbabi, B. Lebel, M. Kassouf, R. Atallah, and B. L. Agba, “Threat Intelligence Generation Using Network Telescope Data for Industrial Control Systems,”IEEE Transactions on Information Forensics and Security, vol. 16, pp. 3355–3370, 2021. [Online]. Available: http://dx.doi.org/10.1109/TIFS.2021.3078261

  16. [16]

    A framework for the application of network telescope sensors in a global ip network,

    B. V . W . Irwin, “ A framework for the application of network telescope sensors in a global ip network,” 2011