pith. machine review for the scientific record. sign in

arxiv: 2509.05294 · v1 · pith:36NVS7QNnew · submitted 2025-09-05 · ✦ hep-ex

The LHCb Stripping Project: Sustainable Legacy Data Processing for High-Energy Physics

Pith reviewed 2026-05-18 01:58 UTC · model grok-4.3

classification ✦ hep-ex
keywords LHCbStrippinglegacy datadata processinghigh-energy physicscomputing frameworkGitLab workflowsRun 1 and Run 2
0
0 comments X

The pith

The LHCb Stripping project refines LHC collision data into targeted samples that enable continued analysis of Runs 1 and 2 legacy datasets.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents the Stripping project as the mechanism that converts the enormous volume of raw particle collision records into smaller, physics-relevant subsets for offline study. It shows how the framework sustains access to earlier LHCb runs even while the experiment prioritizes newer data collections. The work details a Python-based configuration layer, integration with existing LHCb computing services, and the use of GitLab-driven automation to run repeated large-scale campaigns. These elements together keep legacy data usable without requiring full re-processing of the entire original dataset each time.

Core claim

The LHCb Stripping project maintains a Python-configurable architecture together with GitLab workflows, continuous integration, and parallelized processing to execute re-Stripping campaigns that turn the full Run 1 and Run 2 collision samples into manageable, analysis-ready outputs while preserving the full software stack for both legacy and live data.

What carries the argument

The Python-configurable Stripping architecture that defines selection criteria and is executed through automated GitLab-managed campaigns on LHCb computing infrastructure.

If this is right

  • Legacy Runs 1 and 2 data remain available for new physics measurements without full re-processing of raw records.
  • Large-scale Stripping campaigns can be launched and monitored through standardized GitLab procedures and continuous integration checks.
  • The same infrastructure supports both historical re-analysis and processing of newer data collections.
  • Organizational practices such as automation and parallel execution reduce the human effort needed to maintain the data samples.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Other high-energy physics experiments facing similar data-volume growth could adapt the configurable campaign model to manage their own legacy archives.
  • The emphasis on keeping the full software stack alive alongside the data suggests a general pattern for long-term reproducibility in big-science computing.
  • If automation scales as described, the computational cost of repeated Stripping passes may stay manageable even as dataset sizes increase.

Load-bearing premise

The described Python architecture, GitLab workflows, and automation will continue to function effectively for both legacy and current data processing as the collaboration moves focus to newer runs.

What would settle it

An inability to produce valid re-Stripped samples from the Run 1 or Run 2 datasets using the current framework after a software-stack update would show the sustainability claim does not hold.

read the original abstract

The LHCb Stripping project is a pivotal component of the experiment's data processing framework, designed to refine vast volumes of collision data into manageable samples for offline analysis. It ensures the re-analysis of Runs 1 and 2 legacy data, maintains the software stack, and executes (re-)Stripping campaigns. As the focus shifts toward newer data sets, the project continues to optimize infrastructure for both legacy and live data processing. This paper provides a comprehensive overview of the Stripping framework, detailing its Python-configurable architecture, integration with LHCb computing systems, and large-scale campaign management. We highlight organizational advancements such as GitLab-based workflows, continuous integration, automation, and parallelized processing, alongside computational challenges. Finally, we discuss lessons learned and outline a future road-map to sustain efficient access to valuable physics legacy data sets for the LHCb collaboration.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The manuscript presents the LHCb Stripping project as a key element in the experiment's data processing pipeline, aimed at distilling large amounts of collision data into manageable samples for offline analysis. It focuses on ensuring the re-analysis of legacy data from Runs 1 and 2, maintaining the software stack, and conducting (re-)Stripping campaigns. The paper describes the Python-configurable architecture, its integration with LHCb computing systems, GitLab-based workflows, continuous integration, automation, parallelized processing, computational challenges, lessons learned, and a future roadmap for sustaining efficient access to legacy physics data.

Significance. If the framework and practices described are effective, this paper offers valuable insights into sustainable data processing strategies for high-energy physics experiments dealing with legacy datasets. It highlights organizational and technical advancements that could serve as a model for other collaborations. However, the lack of quantitative metrics on performance and scalability reduces the immediate assessable impact.

major comments (1)
  1. [Abstract and section on organizational advancements / computational challenges] The central claim that the Python-configurable architecture, GitLab workflows, CI automation, and parallelized processing will remain effective and sustainable for both legacy Runs 1-2 re-stripping and newer datasets (abstract and roadmap section) lacks any quantitative benchmarks such as throughput rates, CPU/memory utilization, campaign success rates, or scaling behavior under increased data volumes. This is load-bearing for the sustainability and optimization assertions.
minor comments (1)
  1. [Abstract] The abstract refers to 'optimizations' and 'computational challenges' without enumerating them; adding one or two concrete examples would improve clarity for readers unfamiliar with LHCb workflows.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their detailed review and constructive comments on the manuscript. The observation that quantitative benchmarks are needed to support claims of sustainability and effectiveness is valid, and we will revise the relevant sections to incorporate available performance data from the Stripping campaigns.

read point-by-point responses
  1. Referee: [Abstract and section on organizational advancements / computational challenges] The central claim that the Python-configurable architecture, GitLab workflows, CI automation, and parallelized processing will remain effective and sustainable for both legacy Runs 1-2 re-stripping and newer datasets (abstract and roadmap section) lacks any quantitative benchmarks such as throughput rates, CPU/memory utilization, campaign success rates, or scaling behavior under increased data volumes. This is load-bearing for the sustainability and optimization assertions.

    Authors: We agree that the absence of quantitative benchmarks weakens the support for the sustainability assertions in the abstract and roadmap. The manuscript is structured as an overview of the framework architecture, workflows, and organizational practices rather than a dedicated performance analysis. Nevertheless, we can draw on internal campaign records to add summary statistics, including typical event throughput, observed CPU and memory utilization during parallel processing, and success rates from the Runs 1-2 re-stripping campaigns. We will insert a short quantitative summary into the computational challenges section and cross-reference it from the abstract and roadmap. Projections for scaling to newer datasets will be added as a qualitative discussion based on current infrastructure. These changes will be included in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity: purely descriptive project overview

full rationale

The manuscript is a factual description of the LHCb Stripping framework, its Python-configurable architecture, GitLab workflows, CI automation, and campaign management practices. It contains no equations, no fitted parameters, no predictions of derived quantities, and no load-bearing self-citations that reduce the central claims to unverified inputs. All content is self-contained reporting on existing infrastructure and organizational changes, with no derivation chain that collapses by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

No free parameters or invented entities appear, as the work is a project description rather than a theoretical or data-analysis paper. The central claims rest on domain assumptions about the effectiveness of standard HEP computing practices and the described workflows.

axioms (1)
  • domain assumption Existing LHCb computing systems and software stacks provide a reliable foundation for data processing campaigns.
    The overview assumes without further justification that the integrated infrastructure supports the claimed optimizations and sustainability.

pith-pipeline@v0.9.0 · 5690 in / 1176 out tokens · 43118 ms · 2026-05-18T01:58:16.009340+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 16 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Evidence for the decay $B^0_s\to\phi\eta'$

    hep-ex 2026-05 conditional novelty 8.0

    First evidence for B_s^0 to phi eta-prime decay with relative branching ratio (3.56 ± 0.79 ± 0.18 ± 0.06) x 10^{-2} and absolute branching fraction (0.66 ± 0.15 ± 0.03 ± 0.02) x 10^{-6}.

  2. Observation of the charmless purely baryonic decay $\mathinner{\mathit{\Lambda}^0_b\!\to \mathit{\Lambda} p \overline{p}}$

    hep-ex 2026-05 conditional novelty 8.0

    First observation of Λ_b^0 → Λ p p-bar with 5.1σ significance and relative branching fraction (5.1 ± 1.3(stat) ± 0.3(syst)) × 10^{-2} to the reference mode Λ_b^0 → Λ K^+ K^-.

  3. Angular analysis of the $B^+\to\pi^+\mu^+\mu^-$ decay

    hep-ex 2026-04 unverdicted novelty 8.0

    First measurement of A_FB and F_H in B+→π+μ+μ− decay is consistent with Standard Model predictions in both high- and low-mass dimuon regions.

  4. Observation of a new excited charm-strange meson $D_{s1}(2933)^+$ in $B^0\to D^+ D^- K^+ \pi^-$ decays

    hep-ex 2026-04 accept novelty 8.0

    A new charm-strange resonance D_s1(2933)^+ with J^P=1^+ is observed at >10 sigma in B^0 to D+ D- K+ pi- decays, with measured mass 2933 MeV and width 72 MeV.

  5. Study of the $B^0 \to \Lambda_c^+ \bar{\Lambda}_c^- K_S^0$ decay

    hep-ex 2026-04 unverdicted novelty 7.0

    Relative branching fraction B(B0 → Λc+ Λc- KS0)/B(B+ → Λc+ Λc- K+) measured as 0.53 ± 0.05 ± 0.05 with 3.9σ evidence for Ξc(2923)+ and Ξc(2939)+ resonances consistent with isospin partners.

  6. Measurement of the CKM angle $\gamma$ in $B^{\pm} \rightarrow D(\rightarrow K^{0}_{\rm S} h^{\prime+}h^{\prime-})h^{\pm}$ decays with a novel approach

    hep-ex 2026-04 unverdicted novelty 7.0

    A novel model-independent approach with per-event phase-space weights on combined BESIII and LHCb data measures the CKM angle γ as (71.3 ± 5.0)° in B± → D(→ K0S h'+h'-) h± decays.

  7. Observation of the decay $\chi_{c1}(3872)\rightarrow J\mskip -3mu/\mskip -2mu\psi \mu^+\mu^-$

    hep-ex 2026-01 accept novelty 7.0

    First observation of χ_c1(3872) → J/ψ μ⁺μ⁻ reported at 6.5σ significance with branching fraction ratio (1.68 ± 0.37) × 10^{-3} relative to the π⁺π⁻ mode.

  8. Test of lepton flavour universality with $B^0\to K^{*0}\ell^+\ell^-$ decays at large dilepton invariant mass

    hep-ex 2026-04 accept novelty 6.0

    R_K*0 is measured as 1.08^{+0.14}_{-0.12}(stat) ± 0.07(syst) for q² > 14 GeV²/c⁴ in B⁰ → K*⁰ ℓ⁺ℓ⁻ decays, consistent with the Standard Model.

  9. Search for the lepton-flavour violating decays $B^+ \to \pi^+ \mu^\pm e^\mp$

    hep-ex 2026-04 accept novelty 6.0

    No signal observed for B+ → π+ μ± e∓; branching fraction upper limit set at 1.8 × 10^{-9} at 90% CL.

  10. Measurement of charged-hadron distributions in heavy-flavor jets in proton-proton collisions at $\sqrt{s}$=13 TeV

    hep-ex 2025-11 conditional novelty 6.0

    Charged-hadron distributions in heavy-flavor jets differ from light-quark jets in ways consistent with dead-cone suppression and hard fragmentation of the heavy hadron.

  11. Search for $K_{\mathrm{S(L)}}^{0} \rightarrow \pi^{+}\pi^{-}\mu^{+}\mu^{-}$ decays at LHCb

    hep-ex 2025-11 accept novelty 6.0

    No evidence for KS0 or KL0 to pi+ pi- mu+ mu- decays; first upper limits set at 1.4e-9 and 6.6e-7 (90% CL).

  12. Measurement of inclusive production of charmonium states in $b$-hadron decays via their decay into $\phi \phi$

    hep-ex 2026-04 unverdicted novelty 5.0

    LHCb reports branching fractions B(b→χ_c0,1,2 X) and B(b→η_c(2S)X)×B(η_c(2S)→φφ) plus the most precise η_c(1S) mass from φφ decays in 5.9 fb⁻¹ of data.

  13. Measurement of the $W$-boson production cross-sections in $pp$ collisions at $\sqrt{s}$ = 13 TeV in the forward region

    hep-ex 2026-04 accept novelty 4.0

    LHCb measures forward W+ and W- production cross-sections of 1754.2 pb and 1178.1 pb at 13 TeV, agreeing with NNLO QCD predictions at higher precision than prior results.

  14. Measurement of the branching fractions and longitudinal polarisations of $B^0_{(s)} \to K^{*0} \kern 0.18em \overline{\kern -0.18em K}{}^{*0}$ decays

    hep-ex 2025-12 accept novelty 4.0

    LHCb measures f_L^d = 0.600 and f_L^s = 0.159 for B to K* Kbar* decays and reports a ratio L of 4.92 that confirms 4.4 sigma discrepancy with theory.

  15. Branching fraction measurement of the $\mathit{\Lambda} \to p \mu^- \overline{\nu}_{\mu}$ decay

    hep-ex 2025-11 accept novelty 4.0

    Branching fraction B(Λ → p μ⁻ ν̄_μ) measured as (1.462 ± 0.016 ± 0.100 ± 0.011) × 10^{-4}, improving prior precision by a factor of two and yielding R^{μe} = 0.175 ± 0.012 consistent with the Standard Model.

  16. Search for the decays $B_{(s)}^0\to J/\psi\gamma$ at LHCb

    hep-ex 2026-04 accept novelty 3.0

    Upper limits of 2.9×10^{-6} for B_s^0 and 2.5×10^{-6} for B^0 on the branching fractions to J/ψγ at 90% CL, with the B_s limit improved by a factor of 2.5.

Reference graph

Works this paper leans on

22 extracted references · 22 canonical work pages · cited by 16 Pith papers

  1. [1]

    https://gitlab.cern.ch/lhcb/Stripping

    Stripping project. https://gitlab.cern.ch/lhcb/Stripping

  2. [2]

    Com- puting and Software for Big Science9(1), 15 (2025) https://doi.org/10.1007/ s41781-025-00144-5

    Abdelmotteleb, A., Bertolin, A., Burr, C., Couturier, B., Eckstein, E., Fazzini, D., Grieser, N., Haen, C., O’Neil, R., Rodrigues, E., Skidmore, N., Smith, M., Wiederhold, A.R., Zhang, S.: The lhcb sprucing and analysis productions. Com- puting and Software for Big Science9(1), 15 (2025) https://doi.org/10.1007/ s41781-025-00144-5

  3. [3]

    Senjanović, M

    Skidmore, N., Rodrigues, E., Koppenburg, P.: Run-3 offline data processing and analysis at LHCb. PoSEPS-HEP2021, 792 (2022) https://doi.org/10.22323/1. 398.0792

  4. [4]

    Journal of Instrumentation3(08), 08001 (2008) https://doi.org/10.1088/1748-0221/3/08/S08001

    Evans, L., Bryant, P.: Lhc machine. Journal of Instrumentation3(08), 08001 (2008) https://doi.org/10.1088/1748-0221/3/08/S08001

  5. [5]

    CERN Yellow Reports: Mono- graphs

    Br¨ uning, O.S., Collier, P., Lebrun, P., Myers, S., Ostojic, R., Poole, J., Proudlock, P.: LHC Design Report. CERN Yellow Reports: Mono- graphs. CERN, Geneva (2004). https://doi.org/10.5170/CERN-2004-003-V-1 . https://cds.cern.ch/record/782076

  6. [6]

    https://gitlab.cern.ch/lhcb/DaVinci

    DaVinci project. https://gitlab.cern.ch/lhcb/DaVinci

  7. [7]

    JINST3(LHCb-DP-2008- 001), 08005 (2008) https://doi.org/10.1088/1748-0221/3/08/S08005 14

    Collaboration, T.L.: The lhcb detector at the lhc. Journal of Instrumentation 3(08), 08005 (2008) https://doi.org/10.1088/1748-0221/3/08/S08005 12

  8. [8]

    https://gitlab.cern.ch/lhcb/Brunel

    Brunel project. https://gitlab.cern.ch/lhcb/Brunel

  9. [9]

    Journal of Physics: Conference Series331(3), 032023 (2011) https://doi.org/10.1088/1742-6596/331/3/032023

    Clemencic, M., Corti, G., Easo, S., Jones, C.R., Miglioranzi, S., Pappagallo, M., Robbe, P., LHCb Collaboration): The lhcb simulation application, gauss: Design, evolution and experience. Journal of Physics: Conference Series331(3), 032023 (2011) https://doi.org/10.1088/1742-6596/331/3/032023

  10. [10]

    Jour- nal of High Energy Physics2006(05), 026–026 (2006) https://doi.org/10.1088/ 1126-6708/2006/05/026

    Sj¨ ostrand, T., Mrenna, S., Skands, P.: Pythia 6.4 physics and manual. Jour- nal of High Energy Physics2006(05), 026–026 (2006) https://doi.org/10.1088/ 1126-6708/2006/05/026

  11. [11]

    Lange, D.J.: The evtgen particle decay simulation package. Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detec- tors and Associated Equipment462(1), 152–155 (2001) https://doi.org/10.1016/ S0168-9002(01)00089-4 . BEAUTY2000, Proceedings of the 7th Int. Conf. on B-Physics at Hadron Machines

  12. [12]

    Agostinelli, S., Allison, J., Amako, K., Apostolakis, J., Araujo, H., Arce, P., Asai, M., Axen, D., Banerjee, S., Barrand, G., Behner, F., Bellagamba, L., Boudreau, J., Broglia, L., Brunengo, A., Burkhardt, H., Chauvie, S., Chuma, J., Chy- tracek, R., Cooperman, G., Cosmo, G., Degtyarenko, P., Dell’Acqua, A., Depaola, G., Dietrich, D., Enami, R., Feliciel...

  13. [13]

    Technical report, CERN, Geneva (2018)

    Upgrade Software and Computing. Technical report, CERN, Geneva (2018). https://doi.org/10.17181/CERN.LELX.5VJY . https://cds.cern.ch/record/ 2310827 13

  14. [14]

    Barrand, G.,et al.: GAUDI - A software architecture and framework for building HEP data processing applications. Comput. Phys. Commun.140, 45–55 (2001) https://doi.org/10.1016/S0010-4655(01)00254-5

  15. [15]

    https://gitlab.cern.ch/lhcb/Phys

    Phys project. https://gitlab.cern.ch/lhcb/Phys

  16. [16]

    https://gitlab.cern.ch/lhcb/Rec

    Rec project. https://gitlab.cern.ch/lhcb/Rec

  17. [17]

    https://gitlab.cern.ch/lhcb/Lbcom

    Lbcom project. https://gitlab.cern.ch/lhcb/Lbcom

  18. [18]

    https://gitlab.cern.ch/lhcb/LHCb

    LHCb project. https://gitlab.cern.ch/lhcb/LHCb

  19. [19]

    https://ep-dep-sft.web.cern.ch/document/lcg-releases

    LCG Releases. https://ep-dep-sft.web.cern.ch/document/lcg-releases

  20. [20]

    https://its.cern.ch/jira/secure/credits/AroundTheWorld

    CERN JIRA project. https://its.cern.ch/jira/secure/credits/AroundTheWorld. jspa

  21. [21]

    Bernet, R., et al.: DIRAC: The Distributed MC Production and Analysis for LHCb (2004)

  22. [22]

    Ferro-Luzzi, M.: Proposal for an absolute luminosity determination in collid- ing beam experiments using vertex detection of beam–gas interactions. Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spec- trometers, Detectors and Associated Equipment553(3), 388–399 (2005) https: //doi.org/10.1016/j.nima.2005.07.010 14