pith. machine review for the scientific record. sign in

arxiv: 2605.13593 · v1 · submitted 2026-05-13 · 💻 cs.IR

Recognition: no theorem link

Benchmarking the Open Science Data Federation services to develop XRootD best practices

Authors on Pith no claims yet

Pith reviewed 2026-05-14 17:58 UTC · model grok-4.3

classification 💻 cs.IR
keywords pelicanxrootddataosdfvariousclientsconfigurationcreate
0
0 comments X

The pith

Benchmarks of XRootD performance under different configurations provide recommendations for best practices in the Open Science Data Federation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The Open Science Data Federation aims to share scientific data globally using the Pelican platform and XRootD software. Researchers tested how fast data can be transferred with different file sizes, from different distances, using tools like wget, curl, and native mechanisms. They varied the number of parallel streams and buffer settings to see what works best. The goal is to give practical advice to the teams building these tools so data moves efficiently without hitting limits.

Core claim

The benchmarks executed using NRP hosts cover various file sizes, parallel streams, clients from various distances, and standalone clients to track XRootD and Pelican performance in different scenarios.

Load-bearing premise

The assumption that tests on National Research Platform hosts with specific clients and distances are representative of real-world OSDF usage scenarios.

Figures

Figures reproduced from arXiv: 2605.13593 by Fabio Andrijauskas, Frank W\"urthwein, Igor Sfiligoi.

Figure 2
Figure 2. Figure 2: illustrates the findings obtained from accessing the target origin and cache from the Jacksonville, Florida, test point for 15 minutes for each number of parallel threads and file sizes; at the end of the test, the transfer rate is calculated using the time used and the file download size. The results highlight the impact of utilizing various file sizes and parallel requests through regular HTTP access. Th… view at source ↗
read the original abstract

Research has become dependent on processing power and storage, one crucial aspect being data sharing. The Open Science Data Federation (OSDF) project aims to create a scientific global data distribution network based on the Pelican Platform. OSDF relies on the XRootD and Pelican projects. Nevertheless, OSDF must understand the XRootD limits under various configuration options, including transfer rate limits, proper buffer configuration, and storage type effect. We have thus executed a set of benchmarks to create a set of recommendations to share with the XRootD and Pelican teams. This work describes the tests and results performed using National Research Platform (NRP) hosts. The tests cover various file sizes and parallel streams and use clients from various distances from the server host. We also used several standalone clients (wget, curl, pelican) and the native HTCondor file transfer mechanisms. Applying the methodology creates a possibility to track how XRootD and the Pelican layer perform in different scenarios.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript describes an empirical benchmarking study of XRootD and Pelican services for the Open Science Data Federation (OSDF) performed exclusively on National Research Platform (NRP) hosts. Tests vary file sizes, numbers of parallel streams, client-server distances, and client tools (wget, curl, pelican, HTCondor file transfer) to measure performance and derive configuration recommendations for transfer-rate limits, buffer sizing, and storage-type effects.

Significance. If the NRP results generalize, the work supplies concrete empirical data that could guide practical configuration choices for data transfers in scientific federations, directly supporting OSDF operations and the XRootD/Pelican developer communities. The purely empirical design with no fitted parameters or invented entities is a methodological strength that permits straightforward replication of the reported scenarios.

major comments (2)
  1. The central claim that NRP benchmarks suffice to produce OSDF-wide best practices depends on the unvalidated assumption that NRP's uniform storage and network conditions are representative of heterogeneous production OSDF sites. No cross-site replication, sensitivity analysis across storage backends (Ceph vs. Lustre), or comparison against actual OSDF traffic traces is provided, so site-specific effects on buffer sizing or stream scaling remain untested.
  2. Although the abstract states that storage-type effects must be understood, the reported experiments contain no explicit variation or measurement of different storage backends; therefore the resulting recommendations for buffer configuration and transfer limits cannot be shown to hold under the diversity of OSDF storage systems.
minor comments (2)
  1. Specify the exact performance metrics (throughput, latency, error rates) and any statistical aggregation or error-handling procedures used to support the recommendations.
  2. Clarify how client-distance and parallel-stream results are aggregated across runs and whether any confidence intervals or variability measures are reported.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and will revise the text to more accurately reflect the scope and limitations of the NRP-based benchmarks.

read point-by-point responses
  1. Referee: The central claim that NRP benchmarks suffice to produce OSDF-wide best practices depends on the unvalidated assumption that NRP's uniform storage and network conditions are representative of heterogeneous production OSDF sites. No cross-site replication, sensitivity analysis across storage backends (Ceph vs. Lustre), or comparison against actual OSDF traffic traces is provided, so site-specific effects on buffer sizing or stream scaling remain untested.

    Authors: We agree that the NRP environment offers relatively uniform conditions and that the manuscript does not include cross-site replication or direct comparisons to production OSDF traffic. The work is presented as an empirical baseline study performed on NRP hosts, with the methodology intended to be reusable at other sites. In the revised version we will modify the title, abstract, and conclusions to describe the output as 'NRP-derived configuration recommendations for OSDF consideration' rather than implying OSDF-wide applicability. A new limitations section will explicitly note the absence of heterogeneous-site validation and the need for future site-specific testing. These textual changes will be incorporated. revision: yes

  2. Referee: Although the abstract states that storage-type effects must be understood, the reported experiments contain no explicit variation or measurement of different storage backends; therefore the resulting recommendations for buffer configuration and transfer limits cannot be shown to hold under the diversity of OSDF storage systems.

    Authors: The referee is correct: the experiments used the storage configuration present on the NRP hosts and did not vary backends such as Ceph or Lustre. Although the abstract identifies storage-type effects as important, the reported tests focused on file size, stream count, distance, and client tools. We will revise the abstract and introduction to state that storage-type effects are recognized as critical for future work and will qualify all recommendations as conditioned on the tested storage setup. This clarification will be added in the next manuscript version. revision: yes

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an empirical study focused on benchmarking performance; no free parameters, axioms, or invented entities are introduced.

pith-pipeline@v0.9.0 · 5477 in / 885 out tokens · 40910 ms · 2026-05-14T17:58:51.349355+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

4 extracted references · 4 canonical work pages

  1. [1]

    Andrijauskas, I

    F. Andrijauskas, I. Sfiligoi, F. Würthwein, Defining a canonical unit for accounting purposes. In Practice and Experience in Advanced Research Computing 2023: Computing for the Common Good (PEARC '23). Association for Computing Machinery, New York, NY, USA, 288–291. (2023) https://doi.org/10.1145/3569951.3597574

  2. [2]

    Andrijauskas, D

    F. Andrijauskas, D. Weitzel, F. Wuerthwein, Open Science Data Federation - operation and monitoring. In Practice and Experience in Advanced Research Computing 2024: Human Powered Computing (PEARC '24). Association for Computing Machinery, New York, NY, USA, Article 63, 1–5 (2024). https://doi.org/10.1145/3626203.3670557

  3. [3]

    Dorigo, P

    A. Dorigo, P. Elmer, F. Furano, and A. Hanushevsky. XROOTD/TXNetFile: a highly scalable architecture for data access in the ROOT environment. In Proceedings of the 4th WSEAS International Conference on Telecommunications and Informatics (TELE- INFO'05). World Scientific and Engineering Academy and Society (WSEAS), Stevens Point, Wisconsin, USA, Article 46...

  4. [4]

    Z. Deng, A. Sim, K. Wu, C. Guok, D. Hazen, I. Monga, F. Andrijauskas, F. Würthwein, D. Weitzel. Analyzing Transatlantic Network Traffic over Scientific Data Caches. In Proceedings of the 2023 on Systems and Network Telemetry and Analytics (SNTA '23). Association for Computing Machinery, New York, NY, USA, 19–22. (2023) https://doi.org/10.1145/3589012.3594...