Distributed data storage for modern astroparticle physics experiments
Pith reviewed 2026-05-24 20:52 UTC · model grok-4.3
The pith
A distributed storage system combines data from KASCADE and TAIGA experiments into unified samples for multi-messenger analysis.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The distributed storage, based on the single write-multiple read model, provides a special aggregation service that combines data from different experiments into a single sample, supporting multi-messenger methods for data exploration.
What carries the argument
The aggregation service that extracts data as aggregated events from different sources into a single sample.
If this is right
- Users gain direct access to combined event samples from multiple experiments without manual merging of separate files.
- Experimental groups can share raw or processed data through a common repository using the SWMR access pattern.
- Multi-messenger analysis methods become applicable across the participating experiments via the aggregation service.
- Both web and API interfaces allow participants to retrieve either file collections or aggregated data sets.
Where Pith is reading between the lines
- The same aggregation approach could extend to additional astroparticle experiments if their data formats prove compatible.
- Performance of the aggregation service on large combined samples would determine practical limits for real-time multi-messenger queries.
- Adoption would require verification that the unified interface preserves all necessary metadata from each source experiment.
Load-bearing premise
Data from the KASCADE and TAIGA experiments can be effectively combined under a unified interface and aggregated into single samples without major compatibility or information loss issues.
What would settle it
An attempt to aggregate real data from KASCADE and TAIGA that produces either significant information loss or incompatible event samples under the unified interface.
Figures
read the original abstract
The German-Russian Astroparticle Data Life Cycle Initiative is an international project launched in 2018. The Initiative aims to develop technologies that provide a unified approach to data management, as well as to demonstrate their applicability on the example of two large astrophysical experiments - KASCADE and TAIGA. One of the key points of the project is the development of a distributed storage, which, on the one hand, will allow data of several experiments to be combined into a single repository with unified interface, and on the other hand, will provide data to all participants of experimental groups for multi-messenger analysis. Our approach to storage design is based on the single write-multiple read (SWMR) model for accessing raw or centrally processed data for further analysis. The main feature of the distributed storage is the ability to extract data either as a collection of files or as aggregated events from different sources. In the last case the storage provides users with a special service that aggregates data from different storages into a single sample. Thanks to this feature, multi-messenger methods used for more sophisticated data exploration can be applied. Users can use both Web-interface and Application Programming Interface (API) for accessing the storage. In this paper we describe the architecture of a distributed data storage for astroparticle physics and discuss the current status of our work.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript describes the architecture of a distributed data storage system for the German-Russian Astroparticle Data Life Cycle Initiative. It targets unification of data from the KASCADE and TAIGA experiments under a single interface to support multi-messenger analysis. The design uses a single-writer multiple-reader (SWMR) model for raw or processed data, with the key capability to extract either collections of files or aggregated events via a dedicated aggregation service; access is provided through both a Web interface and an API. The paper presents the high-level design choices and reports on the current status of the work.
Significance. If the aggregation service and unified interface are realized with demonstrated compatibility, the system would offer a practical platform for combining heterogeneous astroparticle datasets, directly enabling the multi-messenger methods highlighted in the abstract. The dual file-versus-event extraction mode is a pragmatic feature that aligns with real analysis workflows. The work is at the design-and-status stage rather than a completed implementation with performance data.
major comments (1)
- [Architecture / main feature description] Architecture description (abstract and main text): the central claim that the storage supplies a special aggregation service turning data from separate experiments into unified single samples rests on an unelaborated assumption. No data model, schema-mapping rules, or analysis of compatibility between KASCADE and TAIGA event formats is supplied, leaving open the risk of information loss or incompatibility that the skeptic note correctly flags as load-bearing for the multi-messenger use case.
Simulated Author's Rebuttal
We thank the referee for the detailed review and constructive feedback on our architecture description paper. The single major comment is addressed point-by-point below. We agree that the aggregation service claim requires clearer scoping in the text.
read point-by-point responses
-
Referee: [Architecture / main feature description] Architecture description (abstract and main text): the central claim that the storage supplies a special aggregation service turning data from separate experiments into unified single samples rests on an unelaborated assumption. No data model, schema-mapping rules, or analysis of compatibility between KASCADE and TAIGA event formats is supplied, leaving open the risk of information loss or incompatibility that the skeptic note correctly flags as load-bearing for the multi-messenger use case.
Authors: We agree with the observation. The manuscript is an architecture and status paper focused on the SWMR-based distributed storage, web/API access, and the high-level design of the aggregation service for file collections or event samples. No data model, schema mappings, or KASCADE-TAIGA compatibility analysis appears because those elements lie outside the paper's scope; they form part of separate ongoing work in the German-Russian Astroparticle Data Life Cycle Initiative. The aggregation service is presented as a planned capability rather than a completed implementation. We will revise the abstract and relevant sections to explicitly state the current design-stage status, note that detailed compatibility studies are in progress, and avoid implying that unified samples are already realized. This clarification directly addresses the concern about unelaborated assumptions and the risk of information loss. revision: yes
Circularity Check
No circularity: purely descriptive architecture paper with no derivations or fitted claims
full rationale
The manuscript is a high-level description of planned distributed storage infrastructure (SWMR model, file vs. event extraction, aggregation service, Web/API access) for KASCADE and TAIGA data. It contains no equations, no parameter fitting, no uniqueness theorems, no self-citations used as load-bearing premises, and no predictive claims that reduce to inputs by construction. The aggregation-service capability is asserted at the architectural level without any mathematical reduction or self-referential justification, so the paper's content is self-contained against external benchmarks and receives the default non-circularity finding.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Data from KASCADE and TAIGA experiments can be aggregated into single samples for multi-messenger analysis using a unified interface.
Reference graph
Works this paper leans on
- [1]
-
[2]
Kahn, S. M. Project Status. https://project.lsst.org/groups/sac/sites/lsst. org.groups.sac/files/Kahn_{}projectstatus.pdf
-
[3]
https://doi.org/10.15161/oar.it/1446204371.89
MAGIC. https://doi.org/10.15161/oar.it/1446204371.89
-
[4]
for the MAGIC Collaboration: Overview of MAGIC results
Ricoa, J. for the MAGIC Collaboration: Overview of MAGIC results. In. 37th In- ternational Conference on High Energy Physics, 2-9 July 2014 Valencia, Spain, Nuclear and Particle Physics Proceedings, 273275, 328-333 (2016)
work page 2014
-
[5]
Exploring the Universe at the Highest Energies
Cherenkov Telescope Array. Exploring the Universe at the Highest Energies. https: //www.cta-observatory.org/. Last accessed 24 Jan 2019
work page 2019
-
[6]
Science with the Cherenkov Telescope Array
The Cherenkov Telescope Array Consortium: Science with the Cherenkov Telescope Array. Arxiv: 1709.07997, https://arxiv.org/pdf/1709.07997. Last accessed 24 Jan 2019
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[7]
https://veritas.sao.arizona.edu/
VERITAS. https://veritas.sao.arizona.edu/. Last accessed 24 Jan 2019
work page 2019
-
[8]
HESS, https://www.mpi-hd.mpg.de/hfm/HESS/. Last accessed 24 Jan 2019
work page 2019
- [9]
-
[10]
Franckowiak, A.: Multimessenger Astronomy with Neutrinos. J. Phys.: Conf. Ser., 888, 012009 (2017)
work page 2017
-
[11]
gamma-sky.net: Portal to the Gamma-Ray Sky
Voruganti, A., Deil1, Ch. , Donath, A., and King, J.: gamma-sky.net: Portal to the Gamma-Ray Sky. Arxiv: 1709.04217, https://arxiv.org/pdf/1709.04217. Last accessed 24 Jan 2019
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[12]
https://www.eso.org/sci/ meetings/2015/eso-2020/eso2015_{}Gamma_{}Ray_{}Wagner.pdf
Wagner, S.: Gamma – Ray Astronomy in the 2020s. https://www.eso.org/sci/ meetings/2015/eso-2020/eso2015_{}Gamma_{}Ray_{}Wagner.pdf. Last accessed Jan. 24 2019
work page 2015
-
[13]
HDF5 Single–writer/Multiple–reader Users Guide. https://support.hdfgroup. org/HDF5/docNewFeatures/SWMR/HDF5_{}SWMR_{}Users_{}Guide.pdf. Last ac- cessed June 06, 2019. 6 A. Kryukov et al
work page 2019
-
[14]
W.D.Apel and etc. The KASCADE-Grande experiment. Nuclear Instruments and Methods in Physics Research, Section A,620(2010), pp.202–216, https://doi.org/ 10.1016/j.nima.2010.03.147
-
[15]
https://taiga-experiment.info/
TAIGA. https://taiga-experiment.info/. Last accessed 24 Jan 2019
work page 2019
-
[16]
Budnev, N. and etc. The TAIGA experiment: From cosmic-ray to gamma-ray as- tronomy in the Tunka valley. Nuclear Instruments and Methods in Physics Research. Section A, 845(2017), pp.330–333, https://doi.org/10.1016/j.nima.2016.06. 041
-
[17]
Bychkov, I., et al.: RussianGerman Astroparticle Data Life Cycle Initiative. Data, 4(4), 56 (2018). DOI: 10.3390/data3040056
-
[18]
Blomer, J. , Buncic, P. , Ganis, G. , Hardi, N. , Meusel, R., and Popescu, R.: New directions in the CernVM file system. In. 22nd International Conference on Computing in High Energy and Nuclear Physics (CHEP2016), 1014 October 2016, San Francisco, USA. Journal of Physics: Conf. Series, 898, 062031 (2017)
work page 2016
-
[19]
MariaDB home page. https://mariadb.org/. Last accessed Jan. 24, 2019
work page 2019
-
[20]
https://www.percona.com/live/18/sites/default/files/slides/ TimescaleDB-Percona-2018-main.pdf
Freedman, M.J.: TimescaleDB: Re-engineering PostgreSQL as a time-series database. https://www.percona.com/live/18/sites/default/files/slides/ TimescaleDB-Percona-2018-main.pdf . Last accessed 24 Jan 2019
work page 2018
-
[21]
AstroServ: Distributed Database for Serving Large-Scale Full Life-Cycle Astronomical Data
Yang,Ch., et. al.: AstroServ: Distributed Database for Serving Large-Scale Full Life-Cycle Astronomical Data. ArXiv: 1811.10861. https://arxiv.org/pdf/1811. 10861. Last accessed 24 Jan 2019
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[22]
https://cds.cern.ch/record/2638621/files/ evaluation-timescaledb-postgresql.pdf
Stefancova, E.: Evaluation of the TimescaleDB PostgreSQL Time Series extension. https://cds.cern.ch/record/2638621/files/ evaluation-timescaledb-postgresql.pdf. Last accessed 24 Jan 2019
-
[23]
Kaitai Struct. http://doc.kaitai.io/. Last accessed 24 Jan 2019
work page 2019
-
[24]
Bychkov, I. et al.: Using binary file format description languages for document- ing, parsing and verifying raw data in TAIGA experiment. In. International Con- ference ”Distributed Computing and Grid-technologies in Science and Education” 2018 (GRID’2018), Dubna, Russia, September 10-14, 2018. CEUR Workshop Pro- ceedings, 2267, 563-567 (2018)
work page 2018
-
[25]
IEEE Cloud Computing, 3(5), 76-80 (2016)
Sill, A.: The Design and Architecture of Microservices. IEEE Cloud Computing, 3(5), 76-80 (2016)
work page 2016
-
[26]
Th.: Architectural Styles and the Design of Network-based Soft- ware Architectures
Fielding, R. Th.: Architectural Styles and the Design of Network-based Soft- ware Architectures. https://www.ics.uci.edu/~{}fielding/pubs/dissertation/ fielding_dissertation.pdf, PhD Thesis (2000). Last accessed 24 Jan 2019
work page 2000
-
[27]
Docker home page. https://www.docker.com/. Last accessed 24 Jan 2019
work page 2019
-
[28]
Nguyen, M.-D. and etc. Data aggregation in the Astroparticle Physics Distributed Data Storage. In Proc. of 3-d Int Workshop DLC-2019 (this book)
work page 2019
-
[29]
Bychkov, I. and etc. Metadata extraction from raw astroparticle data of TAIGA experiment. In Proc. of 3-d Int Workshop DLC-2019 (this book)
work page 2019
-
[30]
KASCADE Cosmic Ray Data Centre (KCDC). https://kcdc.ikp.kit.edu/. Last accessed 24 Jan 2019
work page 2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.