The LHCb Stripping Project: Sustainable Legacy Data Processing for High-Energy Physics
Pith reviewed 2026-05-18 01:58 UTC · model grok-4.3
The pith
The LHCb Stripping project refines LHC collision data into targeted samples that enable continued analysis of Runs 1 and 2 legacy datasets.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The LHCb Stripping project maintains a Python-configurable architecture together with GitLab workflows, continuous integration, and parallelized processing to execute re-Stripping campaigns that turn the full Run 1 and Run 2 collision samples into manageable, analysis-ready outputs while preserving the full software stack for both legacy and live data.
What carries the argument
The Python-configurable Stripping architecture that defines selection criteria and is executed through automated GitLab-managed campaigns on LHCb computing infrastructure.
If this is right
- Legacy Runs 1 and 2 data remain available for new physics measurements without full re-processing of raw records.
- Large-scale Stripping campaigns can be launched and monitored through standardized GitLab procedures and continuous integration checks.
- The same infrastructure supports both historical re-analysis and processing of newer data collections.
- Organizational practices such as automation and parallel execution reduce the human effort needed to maintain the data samples.
Where Pith is reading between the lines
- Other high-energy physics experiments facing similar data-volume growth could adapt the configurable campaign model to manage their own legacy archives.
- The emphasis on keeping the full software stack alive alongside the data suggests a general pattern for long-term reproducibility in big-science computing.
- If automation scales as described, the computational cost of repeated Stripping passes may stay manageable even as dataset sizes increase.
Load-bearing premise
The described Python architecture, GitLab workflows, and automation will continue to function effectively for both legacy and current data processing as the collaboration moves focus to newer runs.
What would settle it
An inability to produce valid re-Stripped samples from the Run 1 or Run 2 datasets using the current framework after a software-stack update would show the sustainability claim does not hold.
read the original abstract
The LHCb Stripping project is a pivotal component of the experiment's data processing framework, designed to refine vast volumes of collision data into manageable samples for offline analysis. It ensures the re-analysis of Runs 1 and 2 legacy data, maintains the software stack, and executes (re-)Stripping campaigns. As the focus shifts toward newer data sets, the project continues to optimize infrastructure for both legacy and live data processing. This paper provides a comprehensive overview of the Stripping framework, detailing its Python-configurable architecture, integration with LHCb computing systems, and large-scale campaign management. We highlight organizational advancements such as GitLab-based workflows, continuous integration, automation, and parallelized processing, alongside computational challenges. Finally, we discuss lessons learned and outline a future road-map to sustain efficient access to valuable physics legacy data sets for the LHCb collaboration.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents the LHCb Stripping project as a key element in the experiment's data processing pipeline, aimed at distilling large amounts of collision data into manageable samples for offline analysis. It focuses on ensuring the re-analysis of legacy data from Runs 1 and 2, maintaining the software stack, and conducting (re-)Stripping campaigns. The paper describes the Python-configurable architecture, its integration with LHCb computing systems, GitLab-based workflows, continuous integration, automation, parallelized processing, computational challenges, lessons learned, and a future roadmap for sustaining efficient access to legacy physics data.
Significance. If the framework and practices described are effective, this paper offers valuable insights into sustainable data processing strategies for high-energy physics experiments dealing with legacy datasets. It highlights organizational and technical advancements that could serve as a model for other collaborations. However, the lack of quantitative metrics on performance and scalability reduces the immediate assessable impact.
major comments (1)
- [Abstract and section on organizational advancements / computational challenges] The central claim that the Python-configurable architecture, GitLab workflows, CI automation, and parallelized processing will remain effective and sustainable for both legacy Runs 1-2 re-stripping and newer datasets (abstract and roadmap section) lacks any quantitative benchmarks such as throughput rates, CPU/memory utilization, campaign success rates, or scaling behavior under increased data volumes. This is load-bearing for the sustainability and optimization assertions.
minor comments (1)
- [Abstract] The abstract refers to 'optimizations' and 'computational challenges' without enumerating them; adding one or two concrete examples would improve clarity for readers unfamiliar with LHCb workflows.
Simulated Author's Rebuttal
We thank the referee for their detailed review and constructive comments on the manuscript. The observation that quantitative benchmarks are needed to support claims of sustainability and effectiveness is valid, and we will revise the relevant sections to incorporate available performance data from the Stripping campaigns.
read point-by-point responses
-
Referee: [Abstract and section on organizational advancements / computational challenges] The central claim that the Python-configurable architecture, GitLab workflows, CI automation, and parallelized processing will remain effective and sustainable for both legacy Runs 1-2 re-stripping and newer datasets (abstract and roadmap section) lacks any quantitative benchmarks such as throughput rates, CPU/memory utilization, campaign success rates, or scaling behavior under increased data volumes. This is load-bearing for the sustainability and optimization assertions.
Authors: We agree that the absence of quantitative benchmarks weakens the support for the sustainability assertions in the abstract and roadmap. The manuscript is structured as an overview of the framework architecture, workflows, and organizational practices rather than a dedicated performance analysis. Nevertheless, we can draw on internal campaign records to add summary statistics, including typical event throughput, observed CPU and memory utilization during parallel processing, and success rates from the Runs 1-2 re-stripping campaigns. We will insert a short quantitative summary into the computational challenges section and cross-reference it from the abstract and roadmap. Projections for scaling to newer datasets will be added as a qualitative discussion based on current infrastructure. These changes will be included in the revised manuscript. revision: yes
Circularity Check
No circularity: purely descriptive project overview
full rationale
The manuscript is a factual description of the LHCb Stripping framework, its Python-configurable architecture, GitLab workflows, CI automation, and campaign management practices. It contains no equations, no fitted parameters, no predictions of derived quantities, and no load-bearing self-citations that reduce the central claims to unverified inputs. All content is self-contained reporting on existing infrastructure and organizational changes, with no derivation chain that collapses by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Existing LHCb computing systems and software stacks provide a reliable foundation for data processing campaigns.
Forward citations
Cited by 16 Pith papers
-
Evidence for the decay $B^0_s\to\phi\eta'$
First evidence for B_s^0 to phi eta-prime decay with relative branching ratio (3.56 ± 0.79 ± 0.18 ± 0.06) x 10^{-2} and absolute branching fraction (0.66 ± 0.15 ± 0.03 ± 0.02) x 10^{-6}.
-
Observation of the charmless purely baryonic decay $\mathinner{\mathit{\Lambda}^0_b\!\to \mathit{\Lambda} p \overline{p}}$
First observation of Λ_b^0 → Λ p p-bar with 5.1σ significance and relative branching fraction (5.1 ± 1.3(stat) ± 0.3(syst)) × 10^{-2} to the reference mode Λ_b^0 → Λ K^+ K^-.
-
Angular analysis of the $B^+\to\pi^+\mu^+\mu^-$ decay
First measurement of A_FB and F_H in B+→π+μ+μ− decay is consistent with Standard Model predictions in both high- and low-mass dimuon regions.
-
Observation of a new excited charm-strange meson $D_{s1}(2933)^+$ in $B^0\to D^+ D^- K^+ \pi^-$ decays
A new charm-strange resonance D_s1(2933)^+ with J^P=1^+ is observed at >10 sigma in B^0 to D+ D- K+ pi- decays, with measured mass 2933 MeV and width 72 MeV.
-
Study of the $B^0 \to \Lambda_c^+ \bar{\Lambda}_c^- K_S^0$ decay
Relative branching fraction B(B0 → Λc+ Λc- KS0)/B(B+ → Λc+ Λc- K+) measured as 0.53 ± 0.05 ± 0.05 with 3.9σ evidence for Ξc(2923)+ and Ξc(2939)+ resonances consistent with isospin partners.
-
Measurement of the CKM angle $\gamma$ in $B^{\pm} \rightarrow D(\rightarrow K^{0}_{\rm S} h^{\prime+}h^{\prime-})h^{\pm}$ decays with a novel approach
A novel model-independent approach with per-event phase-space weights on combined BESIII and LHCb data measures the CKM angle γ as (71.3 ± 5.0)° in B± → D(→ K0S h'+h'-) h± decays.
-
Observation of the decay $\chi_{c1}(3872)\rightarrow J\mskip -3mu/\mskip -2mu\psi \mu^+\mu^-$
First observation of χ_c1(3872) → J/ψ μ⁺μ⁻ reported at 6.5σ significance with branching fraction ratio (1.68 ± 0.37) × 10^{-3} relative to the π⁺π⁻ mode.
-
Test of lepton flavour universality with $B^0\to K^{*0}\ell^+\ell^-$ decays at large dilepton invariant mass
R_K*0 is measured as 1.08^{+0.14}_{-0.12}(stat) ± 0.07(syst) for q² > 14 GeV²/c⁴ in B⁰ → K*⁰ ℓ⁺ℓ⁻ decays, consistent with the Standard Model.
-
Search for the lepton-flavour violating decays $B^+ \to \pi^+ \mu^\pm e^\mp$
No signal observed for B+ → π+ μ± e∓; branching fraction upper limit set at 1.8 × 10^{-9} at 90% CL.
-
Measurement of charged-hadron distributions in heavy-flavor jets in proton-proton collisions at $\sqrt{s}$=13 TeV
Charged-hadron distributions in heavy-flavor jets differ from light-quark jets in ways consistent with dead-cone suppression and hard fragmentation of the heavy hadron.
-
Search for $K_{\mathrm{S(L)}}^{0} \rightarrow \pi^{+}\pi^{-}\mu^{+}\mu^{-}$ decays at LHCb
No evidence for KS0 or KL0 to pi+ pi- mu+ mu- decays; first upper limits set at 1.4e-9 and 6.6e-7 (90% CL).
-
Measurement of inclusive production of charmonium states in $b$-hadron decays via their decay into $\phi \phi$
LHCb reports branching fractions B(b→χ_c0,1,2 X) and B(b→η_c(2S)X)×B(η_c(2S)→φφ) plus the most precise η_c(1S) mass from φφ decays in 5.9 fb⁻¹ of data.
-
Measurement of the $W$-boson production cross-sections in $pp$ collisions at $\sqrt{s}$ = 13 TeV in the forward region
LHCb measures forward W+ and W- production cross-sections of 1754.2 pb and 1178.1 pb at 13 TeV, agreeing with NNLO QCD predictions at higher precision than prior results.
-
Measurement of the branching fractions and longitudinal polarisations of $B^0_{(s)} \to K^{*0} \kern 0.18em \overline{\kern -0.18em K}{}^{*0}$ decays
LHCb measures f_L^d = 0.600 and f_L^s = 0.159 for B to K* Kbar* decays and reports a ratio L of 4.92 that confirms 4.4 sigma discrepancy with theory.
-
Branching fraction measurement of the $\mathit{\Lambda} \to p \mu^- \overline{\nu}_{\mu}$ decay
Branching fraction B(Λ → p μ⁻ ν̄_μ) measured as (1.462 ± 0.016 ± 0.100 ± 0.011) × 10^{-4}, improving prior precision by a factor of two and yielding R^{μe} = 0.175 ± 0.012 consistent with the Standard Model.
-
Search for the decays $B_{(s)}^0\to J/\psi\gamma$ at LHCb
Upper limits of 2.9×10^{-6} for B_s^0 and 2.5×10^{-6} for B^0 on the branching fractions to J/ψγ at 90% CL, with the B_s limit improved by a factor of 2.5.
Reference graph
Works this paper leans on
-
[1]
https://gitlab.cern.ch/lhcb/Stripping
Stripping project. https://gitlab.cern.ch/lhcb/Stripping
-
[2]
Com- puting and Software for Big Science9(1), 15 (2025) https://doi.org/10.1007/ s41781-025-00144-5
Abdelmotteleb, A., Bertolin, A., Burr, C., Couturier, B., Eckstein, E., Fazzini, D., Grieser, N., Haen, C., O’Neil, R., Rodrigues, E., Skidmore, N., Smith, M., Wiederhold, A.R., Zhang, S.: The lhcb sprucing and analysis productions. Com- puting and Software for Big Science9(1), 15 (2025) https://doi.org/10.1007/ s41781-025-00144-5
work page 2025
-
[3]
Skidmore, N., Rodrigues, E., Koppenburg, P.: Run-3 offline data processing and analysis at LHCb. PoSEPS-HEP2021, 792 (2022) https://doi.org/10.22323/1. 398.0792
work page doi:10.22323/1 2022
-
[4]
Journal of Instrumentation3(08), 08001 (2008) https://doi.org/10.1088/1748-0221/3/08/S08001
Evans, L., Bryant, P.: Lhc machine. Journal of Instrumentation3(08), 08001 (2008) https://doi.org/10.1088/1748-0221/3/08/S08001
-
[5]
CERN Yellow Reports: Mono- graphs
Br¨ uning, O.S., Collier, P., Lebrun, P., Myers, S., Ostojic, R., Poole, J., Proudlock, P.: LHC Design Report. CERN Yellow Reports: Mono- graphs. CERN, Geneva (2004). https://doi.org/10.5170/CERN-2004-003-V-1 . https://cds.cern.ch/record/782076
- [6]
-
[7]
JINST3(LHCb-DP-2008- 001), 08005 (2008) https://doi.org/10.1088/1748-0221/3/08/S08005 14
Collaboration, T.L.: The lhcb detector at the lhc. Journal of Instrumentation 3(08), 08005 (2008) https://doi.org/10.1088/1748-0221/3/08/S08005 12
- [8]
-
[9]
Clemencic, M., Corti, G., Easo, S., Jones, C.R., Miglioranzi, S., Pappagallo, M., Robbe, P., LHCb Collaboration): The lhcb simulation application, gauss: Design, evolution and experience. Journal of Physics: Conference Series331(3), 032023 (2011) https://doi.org/10.1088/1742-6596/331/3/032023
-
[10]
Sj¨ ostrand, T., Mrenna, S., Skands, P.: Pythia 6.4 physics and manual. Jour- nal of High Energy Physics2006(05), 026–026 (2006) https://doi.org/10.1088/ 1126-6708/2006/05/026
work page 2006
-
[11]
Lange, D.J.: The evtgen particle decay simulation package. Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detec- tors and Associated Equipment462(1), 152–155 (2001) https://doi.org/10.1016/ S0168-9002(01)00089-4 . BEAUTY2000, Proceedings of the 7th Int. Conf. on B-Physics at Hadron Machines
work page 2001
-
[12]
Agostinelli, S., Allison, J., Amako, K., Apostolakis, J., Araujo, H., Arce, P., Asai, M., Axen, D., Banerjee, S., Barrand, G., Behner, F., Bellagamba, L., Boudreau, J., Broglia, L., Brunengo, A., Burkhardt, H., Chauvie, S., Chuma, J., Chy- tracek, R., Cooperman, G., Cosmo, G., Degtyarenko, P., Dell’Acqua, A., Depaola, G., Dietrich, D., Enami, R., Feliciel...
-
[13]
Technical report, CERN, Geneva (2018)
Upgrade Software and Computing. Technical report, CERN, Geneva (2018). https://doi.org/10.17181/CERN.LELX.5VJY . https://cds.cern.ch/record/ 2310827 13
-
[14]
Barrand, G.,et al.: GAUDI - A software architecture and framework for building HEP data processing applications. Comput. Phys. Commun.140, 45–55 (2001) https://doi.org/10.1016/S0010-4655(01)00254-5
- [15]
- [16]
- [17]
- [18]
-
[19]
https://ep-dep-sft.web.cern.ch/document/lcg-releases
LCG Releases. https://ep-dep-sft.web.cern.ch/document/lcg-releases
-
[20]
https://its.cern.ch/jira/secure/credits/AroundTheWorld
CERN JIRA project. https://its.cern.ch/jira/secure/credits/AroundTheWorld. jspa
-
[21]
Bernet, R., et al.: DIRAC: The Distributed MC Production and Analysis for LHCb (2004)
work page 2004
-
[22]
Ferro-Luzzi, M.: Proposal for an absolute luminosity determination in collid- ing beam experiments using vertex detection of beam–gas interactions. Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spec- trometers, Detectors and Associated Equipment553(3), 388–399 (2005) https: //doi.org/10.1016/j.nima.2005.07.010 14
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.