pith. machine review for the scientific record. sign in

arxiv: 2604.23416 · v1 · submitted 2026-04-25 · 🌌 astro-ph.IM · physics.app-ph· physics.data-an· physics.ins-det

Recognition: unknown

ArchGEM: an Advanced Data Analysis Tool for Analyzing Scattered Light Noise in LIGO

Authors on Pith no claims yet

Pith reviewed 2026-05-08 07:07 UTC · model grok-4.3

classification 🌌 astro-ph.IM physics.app-phphysics.data-anphysics.ins-det
keywords scattered light noiseLIGOspectrogramsGaussian Mixture Modelpeak findingdetector characterizationnoise mitigationgravitational-wave interferometers
0
0 comments X

The pith

ArchGEM automates identification of scattered light arches in LIGO spectrograms and recovers the velocities and displacements of the moving surfaces that produce them.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

ArchGEM is a framework that detects arch-shaped features in time-frequency spectrograms caused by stray light reflecting from moving surfaces in Advanced LIGO detectors. It combines prominence-based peak finding to locate the features with Gaussian Mixture Model clustering to group similar morphologies across varying conditions. When applied to glitches from the O3 and O4 observing runs, the tool extracts typical frequencies of 15-40 Hz along with surface velocities of 0.2-0.5 micrometers per second and displacements of 0.1-0.3 micrometers. A sympathetic reader would care because scattered light is a dominant low-frequency noise source that limits detector sensitivity, and an automated link from observed patterns to physical motions can point to concrete fixes rather than generic noise hunting.

Core claim

ArchGEM uses prominence-based peak-finding combined with Gaussian Mixture Model clustering to capture a range of scattered-light morphologies in LIGO spectrograms, then infers the physical motion of the scattering surfaces; on O3 and O4 data it reports average frequencies spanning 15-25 Hz in O3a and O4 but rising to 20-40 Hz in O3b, with typical velocities 0.2-0.5 μm/s and displacements 0.1-0.3 μm, while showing mean frequency offsets within 5 Hz of a Gravity Spy baseline for complex features.

What carries the argument

The ArchGEM pipeline, which first applies prominence-based peak-finding to locate arch features in spectrograms and then uses Gaussian Mixture Model clustering to associate them with scattering-surface motions and extract velocities and displacements.

If this is right

  • The observed frequency distributions and velocity ranges can be used to prioritize which moving surfaces inside the detector vacuum system require damping or shielding.
  • Performance remains consistent for overlapping or complex arches, supporting its use on the full catalog of glitches rather than only clean examples.
  • The recovered surface displacements of 0.1-0.3 μm provide a quantitative target for mechanical isolation improvements in current and next-generation interferometers.
  • The framework supplies a reproducible, automated record of noise morphology that can be compared across observing runs to track changes in detector behavior.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Extending ArchGEM to stream incoming interferometer data could enable near-real-time alerts when new scattering surfaces appear.
  • The same peak-finding and clustering steps might be adapted to other non-stationary noise features that produce distinct time-frequency tracks, such as beam jitter or suspension resonances.
  • Feeding ArchGEM velocity estimates into finite-element models of the detector could predict which hardware modifications would most effectively suppress the observed arches.

Load-bearing premise

The assumption that prominence-based peak-finding plus Gaussian Mixture Model clustering will correctly identify and parameterize the full variety of scattered-light features without substantial false positives, missed arches, or systematic errors in the inferred velocities and displacements under all detector conditions.

What would settle it

A side-by-side comparison of ArchGEM outputs against a large, independently hand-labeled set of scattered-light glitches in which the fraction of missed or mischaracterized features exceeds 10 percent or the velocity estimates deviate by more than 0.2 μm/s from values derived from independent witness sensors.

Figures

Figures reproduced from arXiv: 2604.23416 by Chayan Chatterjee, Gabriela Gonzalez, Karan Jani, Kaylah McGowan, Kelly Holley-Bockelmann, Shania Nichols, Siddharth Soni.

Figure 1
Figure 1. Figure 1: Q-scan of scattered light noise observed in the calibration strain channel at LIGO Livingston during the Third Observing Run (O3). The plot highlights the distinct arch-like morphology of the scattered light noise, indicating periodic motion of a scattering surface within the interferometer environment. This noise predominantly affects the gravitational-wave data between 20–50 Hz. 2. METHODS ARCHGEM is a m… view at source ↗
Figure 2
Figure 2. Figure 2: Flowchart of the ARCHGEM algorithm. The input requires auxiliary channel data, GPS event time, and a duration window. We select our triggers classified by GravitySpy as scattered light events within the specified duration, followed by the generation of Q-scans and spectrograms to enhance the morphology of the scattered light. The data-filtering step extracts the arches from the Q-scan using frequency and e… view at source ↗
Figure 3
Figure 3. Figure 3: Time–frequency representations for a simulated scattering event. The left panel displays a constant-Q (Q-transform) map of the simulated scattered-light noise, highlighting periodic arch-like features in the 20–30 Hz range consistent with scattering surface motion. The right panel shows a standard short-time Fourier-transform spectrogram of the same data, illustrating the relative amplitude variations over… view at source ↗
Figure 4
Figure 4. Figure 4: Box plot comparing fmax,avg distributions for Find Peaks, GMM, and GravitySpy methods across observation runs O3a, O3b, and O4. The Gravity Spy catalog values (derived from Omicron trigger parameters) provide a stable baseline across all runs, with a consistent lower frequency distribution, while the Find Peaks and GMM methods show variability. In O3a and O3b, Find Peaks and GMM exhibit higher frequencies … view at source ↗
Figure 5
Figure 5. Figure 5: Output of ARCHGEM for a single GPS time 1259274047 using the strain channel. The leftmost panel presents the time-frequency spectrogram of the event, where the normalized energy is visualized. The middle panel compares the peak detection results from two methods: the Find Peaks method (green markers for retained peaks) and the Gaussian Mixture Model (GMM) method (red markers). The Find Peaks method is effi… view at source ↗
Figure 6
Figure 6. Figure 6: Residuals for fmax,avg (fARCHGEM − fGS) for scattered-light event classifications across observing runs O3a, O3b, and O4. Each distribution compares the average maximum frequency recovered by the ARCHGEM pipeline (using Find Peaks or GMM) against corresponding GRAVITY SPY estimates. A median near zero indicates close agreement between the two methods, while positive or negative offsets reflect systematic o… view at source ↗
Figure 7
Figure 7. Figure 7: This figure presents violin plots comparing the performance of Method 1 and Method 2 in analyzing scattering events for four variables: (a) Scattering Frequency [Hz], (b) Average Maximum Frequency [Hz], (c) Scattering Surface Movement Distance [µm], and (d) Scattering Surface Velocity [µm s−1 ]. Each subplot visualizes the distribution of values for each method, with ”Method 1” and ”Method 2” indicated on … view at source ↗
read the original abstract

Scattered light is one of the most common sources of non-stationary noise at low frequencies in Advanced LIGO detectors. It appears as arch-like features in time-frequency spectrograms, produced when stray light reflects from moving surfaces and recombines with the main interferometer beam. In this study, we present ArchGEM, an automated framework for identifying and characterizing these arches and recovering the physical properties of the scattering surfaces. ArchGEM combines a prominence-based peak-finding method with a Gaussian Mixture Model clustering approach to capture a range of scattered-light morphologies across different detector conditions. We apply ArchGEM to scattered light glitches across Advanced LIGO observing runs O3 (2019--2020) and O4 (2023--2024). We find that the average frequency distributions of this noise span 15--25 Hz in O3a and O4, but increase to 20--40 Hz during O3b. Typical inferred surface velocities are 0.2--0.5 $\mu$m/s, and inferred surface displacements are 0.1--0.3 $\mu$m. The Gaussian Mixture Model performs most consistently for complex or overlapping features, with mean frequency offsets within 5 Hz of the Gravity Spy baseline. Our results show that ArchGEM provides a practical tool for detector characterization by linking observed spectrogram features to the motion of scattering surfaces and helping guide future mitigation of scattered light noise in current and next-generation interferometers. By quantifying the temporal and spectral behavior of scattered light, ArchGEM provides a robust framework for diagnosing noise sources and guiding targeted mitigation strategies in future detector upgrades.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces ArchGEM, an automated framework that combines prominence-based peak-finding with Gaussian Mixture Model (GMM) clustering to detect and characterize arch-like scattered-light features in LIGO time-frequency spectrograms. Applied to glitches from O3 (2019-2020) and O4 (2023-2024), it reports frequency distributions spanning 15-25 Hz in O3a and O4 (increasing to 20-40 Hz in O3b), infers typical surface velocities of 0.2-0.5 μm/s and displacements of 0.1-0.3 μm, and finds mean frequency offsets within 5 Hz of Gravity Spy labels. The authors conclude that ArchGEM provides a practical tool for linking observed spectrogram features to scattering-surface motion and guiding mitigation in current and next-generation detectors.

Significance. If the physical-parameter inferences hold and the clustering proves robust, ArchGEM could offer a useful automated aid for diagnosing and mitigating scattered-light noise, which limits low-frequency sensitivity in Advanced LIGO. The application to real O3/O4 data and the direct comparison to an existing catalog (Gravity Spy) are positive steps toward practical detector-characterization tools.

major comments (2)
  1. [Abstract and Results] Abstract and Results section: The central claim that ArchGEM 'links observed spectrogram features to the motion of scattering surfaces' rests on the conversion of GMM-clustered arch morphologies into physical velocities (0.2–0.5 μm/s) and displacements (0.1–0.3 μm). No derivation of this mapping, no error bars, no injected-signal recovery tests, and no ground-truth benchmarks are supplied, leaving the accuracy and possible systematics of these quantities unquantified.
  2. [Methods] Methods section: The prominence-based peak-finding plus GMM clustering is asserted to 'perform most consistently for complex or overlapping features,' yet the only quantitative support given is a mean frequency offset to Gravity Spy; no precision/recall metrics, false-positive rates, or tests of robustness across varying O3/O4 noise conditions are reported.
minor comments (1)
  1. [Abstract] The abstract states that the GMM 'performs most consistently' but does not define the consistency metric or the number of events used in the Gravity Spy comparison.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed review. The two major comments highlight important gaps in the presentation of the physical-parameter mapping and in the quantitative validation of the clustering performance. We address each point below and indicate the revisions we will make.

read point-by-point responses
  1. Referee: [Abstract and Results] Abstract and Results section: The central claim that ArchGEM 'links observed spectrogram features to the motion of scattering surfaces' rests on the conversion of GMM-clustered arch morphologies into physical velocities (0.2–0.5 μm/s) and displacements (0.1–0.3 μm). No derivation of this mapping, no error bars, no injected-signal recovery tests, and no ground-truth benchmarks are supplied, leaving the accuracy and possible systematics of these quantities unquantified.

    Authors: We agree that the manuscript does not explicitly derive the velocity and displacement values or quantify their uncertainties. The conversion follows the standard relation for scattered-light arches in LIGO, v = f λ / 2 (where λ = 1064 nm is the laser wavelength and f is the arch frequency), with displacement obtained by integrating the velocity over the observed arch duration. We will add this derivation, including the explicit formula and its assumptions, to the Methods section. Error bars will be reported as the standard deviation of the GMM component widths for each cluster. Injected-signal recovery tests and direct ground-truth benchmarks are not feasible with the current data set because no independent, calibrated measurements of scattering-surface motion exist for the O3/O4 glitches; we will therefore add a limitations paragraph noting this and outlining how future work could incorporate controlled injections or auxiliary sensors. revision: partial

  2. Referee: [Methods] Methods section: The prominence-based peak-finding plus GMM clustering is asserted to 'perform most consistently for complex or overlapping features,' yet the only quantitative support given is a mean frequency offset to Gravity Spy; no precision/recall metrics, false-positive rates, or tests of robustness across varying O3/O4 noise conditions are reported.

    Authors: We acknowledge that the current validation relies primarily on the mean frequency offset relative to Gravity Spy labels. To strengthen this, we will add a dedicated performance-evaluation subsection that reports precision and recall on a manually vetted subset of 200 arches (selected across O3a, O3b, and O4), false-positive rates obtained by applying the pipeline to quiet segments without known scattered-light glitches, and robustness tests by repeating the analysis on independent 1-week segments from each observing run that exhibit different glitch densities and noise floors. These metrics will be presented in a new table and accompanying text. revision: yes

Circularity Check

0 steps flagged

No circularity detected; physical inferences rest on external scattering models

full rationale

The paper presents ArchGEM as a data-analysis pipeline combining prominence peak-finding with GMM clustering to identify arches in spectrograms, followed by conversion of observed morphologies to surface velocities and displacements. No equations, derivations, or self-referential steps are described that would reduce the reported physical parameters (0.2–0.5 μm/s velocities, 0.1–0.3 μm displacements) to fitted inputs or internal definitions by construction. The mapping from arch features to motion relies on standard external physical relations for scattered light rather than any fitted parameter renamed as a prediction. Comparisons to Gravity Spy provide an external benchmark. The central claims therefore remain independent of the analysis steps themselves.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no identifiable free parameters, axioms, or invented entities; the reported velocity and displacement values are stated as 'inferred' but the underlying conversion model is not described.

pith-pipeline@v0.9.0 · 5625 in / 1277 out tokens · 120901 ms · 2026-05-08T07:07:22.733629+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

3 extracted references · 1 canonical work pages

  1. [1]

    , " * write output.state after.block = add.period write newline

    ENTRY address archivePrefix author booktitle chapter doi edition editor eprint howpublished institution journal key month number organization pages publisher school series title misctitle type volume year version url label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts ...

  2. [2]

    write newline

    " write newline "" before.all 'output.state := FUNCTION format.url url empty "" new.block "" url * "" * if FUNCTION format.eprint eprint empty "" archivePrefix empty "" archivePrefix "arXiv" = new.block " " eprint * " " * new.block " " eprint * " " * if if if FUNCTION format.doi doi empty "" " " doi * " " * if FUNCTION format.pid doi empty eprint empty ur...

  3. [3]

    flux densities

    thebibliography [1] 20pt to REFERENCES 6pt =0pt -12pt 10pt plus 3pt =0pt =0pt =1pt plus 1pt =0pt =0pt -12pt =13pt plus 1pt =20pt =13pt plus 1pt \@M =10000 =-1.0em =0pt =0pt 0pt =0pt =1.0em @enumiv\@empty 10000 10000 `\.\@m \@noitemerr \@latex@warning Empty `thebibliography' environment \@ifnextchar \@reference \@latexerr Missing key on reference command E...