pith. machine review for the scientific record. sign in

arxiv: 2604.11920 · v1 · submitted 2026-04-13 · ⚛️ physics.geo-ph

Recognition: unknown

pDSurfTomo: A High-Performance Parallel Computing Package for Direct Surface Wave Tomography

Guoyi Chen, Hongjian Fang, Huajian Yao, Junlun Li, Shaohang Zhu

Pith reviewed 2026-05-10 14:56 UTC · model grok-4.3

classification ⚛️ physics.geo-ph
keywords surface wave tomographyparallel computingCPU-GPU accelerationseismic inversiondispersion datatraveltime tomographyhigh-resolution imaging
0
0 comments X

The pith

A hybrid CPU-GPU package makes direct surface wave tomography feasible for large station networks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces pDSurfTomo as a re-engineered version of the serial DSurfTomo method that incorporates parallel processing on CPUs and GPUs to handle the computational demands of high-resolution surface wave tomography. It targets three main bottlenecks: sensitivity kernel calculation through vectorized parallel design, traveltime modeling via OpenMP parallelization of the fast marching method, and solution of large sparse linear systems on GPU hardware. The authors supply a cross-platform GUI that lets users run jobs locally while offloading heavy work to remote clusters. When tested on a real dispersion dataset from 229 stations in North China, the new code completes the inversion more than ten times faster than the original while producing results with negligible differences. This positions the package as a practical tool for scaling direct surface wave tomography to current demands for denser data and finer models.

Core claim

By combining refined parallel sensitivity kernel computation, OpenMP parallelization of surface wave traveltime modeling, and GPU acceleration for large-scale sparse linear least-squares problems, pDSurfTomo overcomes the serial limitations of DSurfTomo and reduces computation time by more than an order of magnitude on observed dispersion data from 229 stations while maintaining negligible discrepancy with the original serial results.

What carries the argument

Hybrid CPU-GPU acceleration that parallelizes sensitivity kernel evaluation with vectorization, distributes the fast marching method across CPU cores, and offloads the linear solver to the GPU.

If this is right

  • Datasets from hundreds of stations become practical to invert at high resolution without prohibitive run times.
  • The same workflow can be applied to other regional or continental-scale surface wave studies that previously exceeded serial limits.
  • The provided GUI with remote cluster support lowers the barrier for users without local high-performance hardware.
  • Routine production of finer-scale shear-velocity models of the crust and upper mantle becomes feasible.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar parallel redesigns could be applied to other serial tomography codes that rely on ray tracing or sensitivity kernels.
  • The speedup opens the possibility of repeated inversions for uncertainty quantification or time-lapse monitoring.
  • Integration with emerging dense nodal arrays could push resolution toward kilometer-scale imaging of the lithosphere.

Load-bearing premise

The parallel CPU and GPU changes preserve numerical accuracy, convergence behavior, and lack of artifacts compared with the original serial code for all relevant problem sizes.

What would settle it

Running the parallel and serial versions on the same large dispersion dataset and finding either a runtime reduction of less than a factor of ten or velocity model differences large enough to affect geological interpretation.

read the original abstract

Surface wave tomography is essential for investigating the shear-wave velocity structure of the crust and upper mantle. The direct surface wave tomography method, DSurfTomo, has become one of the most widely adopted packages due to its ability to account for ray path bending in complex media to increase subsurface characterization accuracy. However, its inherent serial architecture lacks effective support for multicore CPUs and GPUs. Furthermore, its built-in solver is computationally expensive when solving large-scale linear systems. Consequently, the software struggles to meet current demands for large-scale, high-resolution surface wave tomography. To address these limitations, we propose pDSurfTomo, a highly optimized package utilizing hybrid CPU-GPU acceleration. First, it overcomes the scalability bottleneck in sensitivity kernel computation through a refined parallel design; also, it uses vectorization techniques to accelerate the modeling of surface wave dispersion, achieving efficient computation of the sensitivity kernel. Second, it implements parallelization of the serial fast marching method using OpenMP, significantly reducing computation time for surface wave traveltimes. Finally, it incorporates GPU acceleration to efficiently solve large-scale sparse linear least-squares problems. To streamline the workflow, we provide a cross-platform GUI with remote server connectivity, allowing users to execute and visualize inversion tasks locally while seamlessly utilizing remote computing clusters. Application to an observed dispersion dataset from 229 stations in North China demonstrates that pDSurfTomo reduces computation time by more than an order of magnitude while maintaining a negligible discrepancy compared to the original DSurfTomo. It is expected that pDSurfTomo will provide a highly efficient and accessible solution for large-scale, high-resolution surface wave tomography.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces pDSurfTomo, a hybrid CPU-GPU parallel package extending the serial DSurfTomo code for direct surface wave tomography. It parallelizes the fast marching method via OpenMP, vectorizes sensitivity kernel computation for dispersion modeling, and accelerates large-scale sparse linear least-squares inversion on GPUs, while adding a cross-platform GUI with remote cluster support. Application to a 229-station observed dispersion dataset from North China is reported to yield more than 10x reduction in computation time with negligible discrepancy relative to the original serial implementation.

Significance. If the performance gains can be shown to preserve numerical fidelity across problem sizes, the package would enable routine high-resolution surface wave tomography at scales previously limited by serial computation, directly benefiting crustal and upper-mantle imaging studies that rely on accurate ray-path bending in complex media.

major comments (2)
  1. [Application to observed dispersion dataset] Application to observed dispersion dataset: the assertion of 'negligible discrepancy' is presented without quantitative support such as L2-norm differences between inverted velocity models, traveltime residual statistics, or convergence curves comparing the parallel/GPU and serial paths; this single qualitative statement on one 229-station case is load-bearing for the central claim that accuracy is preserved.
  2. [Performance evaluation] Performance evaluation: no scaling curves, hardware specifications, timing breakdowns by component (kernel computation, fast marching, solver), or tests on synthetic cases with known ground truth are provided to substantiate the >10x speedup claim or to demonstrate behavior across problem sizes beyond the single reported instance.
minor comments (1)
  1. [Methods] The description of the vectorization techniques and OpenMP parallelization of the fast marching method would benefit from explicit mention of the libraries or compiler directives employed to aid reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback on our manuscript describing pDSurfTomo. We value the comments highlighting the need for stronger quantitative evidence to support our claims of preserved accuracy and significant performance improvements. We address each major comment below and will revise the manuscript accordingly to enhance its rigor.

read point-by-point responses
  1. Referee: Application to observed dispersion dataset: the assertion of 'negligible discrepancy' is presented without quantitative support such as L2-norm differences between inverted velocity models, traveltime residual statistics, or convergence curves comparing the parallel/GPU and serial paths; this single qualitative statement on one 229-station case is load-bearing for the central claim that accuracy is preserved.

    Authors: We agree that providing quantitative metrics is essential to substantiate the claim of negligible discrepancy. In the revised version, we will include L2-norm differences between the inverted shear-wave velocity models from pDSurfTomo and the original DSurfTomo, detailed traveltime residual statistics (e.g., mean, standard deviation, and histograms), and convergence curves for the linear solver. These will be added for the North China dataset to demonstrate that the parallel implementation preserves numerical fidelity. revision: yes

  2. Referee: Performance evaluation: no scaling curves, hardware specifications, timing breakdowns by component (kernel computation, fast marching, solver), or tests on synthetic cases with known ground truth are provided to substantiate the >10x speedup claim or to demonstrate behavior across problem sizes beyond the single reported instance.

    Authors: We acknowledge the absence of comprehensive performance benchmarks in the current manuscript. To address this, we will add the following in the revised manuscript: (1) detailed hardware specifications including CPU model, GPU model, and memory configurations; (2) timing breakdowns for sensitivity kernel computation, fast marching method, and the linear solver; (3) strong and weak scaling curves for different numbers of stations and grid sizes; and (4) results from synthetic tests with known velocity models to verify both accuracy and speedup across varying problem sizes. This will better support the >10x performance gain claim. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical performance claims rest on direct code-to-code timing and discrepancy checks

full rationale

The paper describes a parallel re-implementation (OpenMP fast marching, vectorized kernels, GPU least-squares) of the pre-existing DSurfTomo serial code. Its central claims are measured wall-clock speedup (>10×) and “negligible discrepancy” on one 229-station North China dataset. These are direct empirical comparisons to the reference serial run, not predictions derived from fitted parameters, self-definitions, or self-citation chains. No equation is shown to equal its own input by construction, and no load-bearing uniqueness theorem or ansatz is imported from the authors’ prior work. The derivation chain is therefore self-contained against external benchmarks (the original serial code).

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a software engineering and optimization paper. It introduces no new physical axioms, free parameters, or invented entities; it relies on the pre-existing DSurfTomo formulation and standard parallel libraries.

pith-pipeline@v0.9.0 · 5610 in / 1190 out tokens · 45143 ms · 2026-05-10T14:56:24.955301+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

3 extracted references · 2 canonical work pages

  1. [1]

    Boschi, L., & Ekström, G. (2002). New images of the Earth's upper mantle from measurements of surface wave phase velocity anomalies. Journal of Geophysical Research: Solid Earth, 107(B4), ESE-1. Brocher, T. M. (2005). Empirical relations between elastic wavespeeds and density in the Earth's crust. Bulletin of the seismological Society of America, 95(6), 2...

  2. [2]

    Li, L., Cai, C., Fang, Y ., & Fang, H. (2022). Multiple surface wave tomography methods and their applications to the Tibetan Platea u. Reviews of Geophysics and Planetary Physics, 54(2), 174-196. Li, W., He, R., Yuan, X., Schneider, F., Tilmann, F., Guo, Z., & Chen, Y . J. (2024). Correlated crustal and mantle melting documents proto -Tibetan Plateau gro...

  3. [3]

    Okuta, R., Unno, Y ., Nishino, D., Hido, S., & Loomis, C. (2017). CuPy: A NumPy- compatible library for NVIDIA GPU calculations. In Proceedings of Workshop on Machine Learning Systems (LearningSys) in the Thirty -first Annual Conference on Neural Information Processing Systems (NIPS). Paige, C. C., & Saunders, M. A. (1975). Solution of sparse indefinite s...