Using the Open Science Data Federation for data distribution: Big Bear Solar Observatory use case
Pith reviewed 2026-05-19 15:19 UTC · model grok-4.3
pith:4NGIECU2 Add to your LaTeX paper
What is a Pith Number?\usepackage{pith}
\pithnumber{4NGIECU2}
Prints a linked pith:4NGIECU2 badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more
The pith
Integrating Big Bear Solar Observatory data into the Open Science Data Federation provides standard reliable access and enables global image processing pipelines.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that integrating the Big Bear Solar Observatory data into the Open Science Data Federation provided standard and reliable data access. The OSDF caches deliver local copies of the data worldwide. This integration made it possible to create a pipeline that applies image processing techniques to all images from BBSO from any point on the planet.
What carries the argument
The Open Science Data Federation (OSDF), a global network expanded from StashCache to include twenty data origins and thirty caches together with new access methods and monitoring tools, which carries the data from BBSO origins to local caches for worldwide use.
If this is right
- Researchers obtain standard and reliable access to BBSO data through the OSDF.
- OSDF caches place local copies of the data at sites worldwide.
- An image-processing pipeline for all BBSO images can run from any global location.
- The OSDF meets sharing requirements in recent NSF solicitations for national cyber-infrastructure.
Where Pith is reading between the lines
- The same integration pattern could be tested with data from other solar or astronomical observatories to check whether comparable pipelines become feasible.
- Wider adoption might reduce the need for each research group to maintain its own data mirrors for large image collections.
- Performance under higher data volumes from future instruments could be measured to see where the current cache network reaches limits.
Load-bearing premise
The OSDF infrastructure of origins, caches, and access methods will deliver reliable performance and local availability for the volume of BBSO data without failures or bottlenecks.
What would settle it
A measurement showing that data retrieval from remote locations still incurs long delays or that the global image-processing pipeline fails to finish when it relies on OSDF caches for BBSO data would falsify the claim.
Figures
read the original abstract
The growing demand for extensive data processing is now a standard in many scientific fields. Efficiently distributing data to processing sites and enabling seamless sharing has become crucial. The Open Science Data Federation (OSDF) builds on the success of the StashCache project to establish a global data distribution network. By expanding StashCache, OSDF integrates additional data origins and caches, enhancing accessibility and performance (20 origins and 30 caches), new access methods, and monitoring and accounting mechanisms. Additionally, the OSDF has become essential to the US national cyber-infrastructure landscape due to the sharing requirements of recent NSF solicitations. One use case for the OSDF is the data access to the Big Bear Solar Observatory (BBSO). Integrating the BBSO data into the OSDF provided standard and reliable data access. Moreover, the OSDF caches provide local data worldwide. Using the OSDF and the BBSO data, creating a pipeline to apply image processing techniques to all images from BBSO anywhere on the planet was possible.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper describes the Open Science Data Federation (OSDF) as an expansion of StashCache into a global data distribution network incorporating 20 origins and 30 caches along with new access methods and monitoring/accounting mechanisms. It presents the integration of Big Bear Solar Observatory (BBSO) data as a use case, asserting that this integration delivers standard and reliable data access, that OSDF caches enable local data availability worldwide, and that the combination permits creation of a pipeline to apply image processing techniques to all BBSO images from any location on the planet.
Significance. If the integration and pipeline function as described, the manuscript provides a practical demonstration of federated data infrastructure supporting distributed scientific workflows in solar physics. It illustrates how OSDF can meet NSF-mandated data-sharing requirements and enable global accessibility for observatory datasets, offering a reusable template for similar integrations in other domains.
major comments (1)
- [BBSO use-case section] The central claims that integrating BBSO data into OSDF 'provided standard and reliable data access' and enabled 'a pipeline to apply image processing techniques to all images from BBSO anywhere on the planet' are load-bearing for the use-case narrative but rest on descriptive assertion alone. No quantitative metrics (e.g., cache-hit rates, transfer latencies, error logs, throughput measurements, or before/after comparisons) are supplied to substantiate reliability or worldwide locality at BBSO data volumes.
minor comments (2)
- [Abstract] The abstract states '20 origins and 30 caches' without indicating whether these figures are current, projected, or measured at the time of writing; the manuscript should clarify the exact status and any growth trajectory.
- [Pipeline description] The description of the image-processing pipeline is high-level; adding a brief schematic or pseudocode of the pipeline steps would improve reproducibility without altering the core narrative.
Simulated Author's Rebuttal
Thank you for reviewing our manuscript. We appreciate the positive assessment of the significance of the work and the specific feedback on the BBSO use-case section. Below we address the major comment.
read point-by-point responses
-
Referee: [BBSO use-case section] The central claims that integrating BBSO data into OSDF 'provided standard and reliable data access' and enabled 'a pipeline to apply image processing techniques to all images from BBSO anywhere on the planet' are load-bearing for the use-case narrative but rest on descriptive assertion alone. No quantitative metrics (e.g., cache-hit rates, transfer latencies, error logs, throughput measurements, or before/after comparisons) are supplied to substantiate reliability or worldwide locality at BBSO data volumes.
Authors: We concur that the claims about standard and reliable data access and the global pipeline are central to the use case and would be better supported by quantitative evidence. The manuscript as submitted emphasizes the architectural integration and the conceptual enablement of worldwide processing rather than empirical performance data. To address this concern, we will revise the manuscript to include quantitative metrics drawn from the OSDF monitoring and accounting mechanisms, such as cache hit rates and transfer statistics for BBSO data where available. We will also provide before-and-after context based on prior access methods if such information can be obtained. This will be added as a dedicated paragraph or subsection in the use-case section. revision: yes
Circularity Check
No circularity in descriptive infrastructure use-case report
full rationale
The manuscript is a high-level narrative describing the integration of BBSO data into the existing OSDF infrastructure and the resulting ability to run image-processing pipelines. It contains no equations, fitted parameters, derivations, or self-citations that could form a load-bearing chain. All claims are presented as direct outcomes of the described integration steps rather than results obtained by reducing prior self-referential inputs to themselves. This is a standard non-circular use-case report whose central assertions rest on external infrastructure behavior rather than internal definitional or fitted loops.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption OSDF provides reliable global data distribution and local caching for observatory data
Reference graph
Works this paper leans on
-
[1]
StashCache: A Distributed Caching Federation for the Open Science Grid , year =
Weitzel, Derek and Zvada, Marian and Vukotic, Ilija and Gardner, Rob and Bockelman, Brian and Rynge, Mats and Hernandez, Edgar Fajardo and Lin, Brian and Selmeci, M\'. StashCache: A Distributed Caching Federation for the Open Science Grid , year =. doi:10.1145/3332186.3332212 , booktitle =
-
[2]
Weitzel, Derek and Bockelman, Brian and Brown, Duncan A. and Couvares, Peter and W\". Data Access for LIGO on the OSG , year =. Proceedings of the Practice and Experience in Advanced Research Computing 2017 on Sustainability, Success and Impact , articleno =. doi:10.1145/3093338.3093363 , abstract =
-
[4]
IceCube experience using XRootD-based Origins with GPU workflows in PNRP , author=. 2023 , eprint=
work page 2023
-
[5]
The Ligo-Virgo-KAGRA Computing Infrastructure for Gravitational-wave Research , author=. 2023 , eprint=
work page 2023
-
[6]
Journal of Physics: Conference Series , abstract =
E Fajardo and A Tadel and M Tadel and B Steer and T Martin and F Würthwein , title =. Journal of Physics: Conference Series , abstract =. 2018 , month =. doi:10.1088/1742-6596/1085/3/032025 , url =
-
[7]
WSEAS Transactions on Computers , volume=
XROOTD-A Highly scalable architecture for data access , author=. WSEAS Transactions on Computers , volume=
- [8]
- [9]
-
[10]
Coluci and Fabio Andrijauskas and Sócrates O
Vitor R. Coluci and Fabio Andrijauskas and Sócrates O. Dantas , keywords =. 8 - Modeling thermal conductivity with Green’s function molecular dynamics simulations , editor =. Modeling, Characterization, and Production of Nanomaterials (Second Edition) , publisher =. 2023 , series =. doi:https://doi.org/10.1016/B978-0-12-819905-3.00008-7 , url =
- [11]
-
[12]
Proceedings of the Practice and Experience on Advanced Research Computing , articleno =
Withers, Alex and Bockelman, Brian and Weitzel, Derek and Brown, Duncan and Gaynor, Jeff and Basney, Jim and Tannenbaum, Todd and Miller, Zach , title =. Proceedings of the Practice and Experience on Advanced Research Computing , articleno =. 2018 , isbn =. doi:10.1145/3219104.3219135 , abstract =
-
[13]
Solar filaments detection using parallel programming in hybrid architectures , year =
Andrijauskas, Fabio and Gradvohl, Andr\'. Solar filaments detection using parallel programming in hybrid architectures , year =. Proceedings of the 2012 Workshop on High-Performance Computing for Astronomy Date , pages =. doi:10.1145/2286976.2286987 , abstract =
-
[14]
Analyzing Transatlantic Network Traffic over Scientific Data Caches , url=
Deng, Ziyue and Sim, Alex and Wu, Kesheng and Guok, Chin and Hazen, Damian and Monga, Inder and Andrijauskas, Fabio and Würthwein, Frank and Weitzel, Derek , year=. Analyzing Transatlantic Network Traffic over Scientific Data Caches , url=. doi:10.1145/3589012.3594897 , booktitle=
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.