pith. sign in

arxiv: 2606.19213 · v1 · pith:PQ5YUVDDnew · submitted 2026-06-17 · 💻 cs.MS · cs.NA· math.NA

Evaluating Rust for Sparse Matrix Kernels in Scientific Computing

Pith reviewed 2026-06-26 18:30 UTC · model grok-4.3

classification 💻 cs.MS cs.NAmath.NA
keywords Rustsparse matricesSpMVKrylov methodsmatrix exponentialperformance evaluationscientific computingmemory safety
0
0 comments X

The pith

Rust sparse kernels match Eigen and PSBLAS performance on core scientific workloads while trailing PETSc on blocked formats.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests Rust as a memory-safe systems language for the sparse matrix operations that underpin scientific computing. The authors code sparse matrix-vector multiplication, Lanczos Krylov methods, and matrix-exponential evaluation in Rust, then time them against Intel oneMKL, Eigen, PETSc, and PSBLAS on a collection of test matrices. Results show Rust reaches speeds comparable to Eigen and PSBLAS for CSC storage while falling behind PETSc's optimized blocked CSR routines. This matters because high-performance numerical code has long accepted memory-unsafe languages for speed; comparable results would let developers keep safety guarantees without rewriting everything in C or Fortran.

Core claim

Rust implementations of the three workloads achieve performance comparable to Eigen and PSBLAS for CSC formats across the benchmark suite, while trailing PETSc's advanced blocked CSR optimizations. The study examines how compile-time monomorphization, SIMD vectorization, and FFI boundaries interact with Rust's safety model and finds that these features support competitive runtimes without prohibitive overhead.

What carries the argument

The three workloads (SpMV, Lanczos-based Krylov methods, and matrix-exponential evaluation) implemented natively in Rust and timed against established C++ and Fortran libraries on representative sparse matrices.

If this is right

  • Rust can serve as a drop-in replacement for CSC-based sparse kernels without major performance loss relative to Eigen and PSBLAS.
  • Compile-time monomorphization and auto-vectorization in Rust suffice to reach state-of-the-art speeds for these operations.
  • FFI boundaries allow Rust code to interoperate with existing libraries while preserving safety invariants.
  • Adoption of Rust would be most immediate for codes already using CSC storage rather than advanced blocked CSR formats.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Teams maintaining large scientific codebases could incrementally replace unsafe kernels with Rust versions where memory safety bugs are a recurring concern.
  • The same evaluation approach could be applied to other candidate languages to map the current performance-safety frontier for numerical libraries.
  • Extending the benchmarks to GPU offload or distributed-memory settings would test whether Rust's ecosystem supports the next layer of scientific workloads.

Load-bearing premise

The selected matrices and three workloads represent the main computational patterns that dominate scientific computing applications.

What would settle it

A set of benchmarks on a wider collection of matrices showing Rust kernels more than 20 percent slower than all baselines on average would falsify the comparability claim.

read the original abstract

Sparse matrix kernels form the computational backbone of scientific computing, traditionally relying on C/C++ and Fortran implementations that prioritize performance over memory safety. This work evaluates Rust as a systems-level alternative for sparse linear algebra by implementing and benchmarking three core workloads: sparse matrix-vector multiplication (SpMV), Lanczos-based Krylov methods, and matrix-exponential evaluation. We compare native Rust code against established baselines (Intel oneMKL, Eigen, PETSc, and PSBLAS) across a suite of representative matrices. Our results show that Rust's sparse kernels achieve performance comparable to Eigen and PSBLAS, tracking the state-of-the-art for CSC formats, while trailing PETSc's advanced blocked CSR optimizations. By analyzing compile-time monomorphization, SIMD vectorization, and FFI boundaries, we assess the practical impact of Rust's safety model and ecosystem readiness. The study provides concrete, evidence-based guidance for modernizing high-performance numerical software stacks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript evaluates Rust as a systems-level language for sparse matrix kernels in scientific computing. It implements three workloads—SpMV, Lanczos-based Krylov methods, and matrix-exponential evaluation—in native Rust and benchmarks them against Intel oneMKL, Eigen, PETSc, and PSBLAS across a suite of representative matrices. The central claim is that Rust kernels achieve performance comparable to Eigen and PSBLAS for CSC formats while trailing PETSc's blocked CSR optimizations; the work further analyzes the performance impact of Rust features including compile-time monomorphization, SIMD vectorization, and FFI boundaries to provide guidance on ecosystem readiness.

Significance. If the empirical comparisons hold and the matrix suite is representative, the paper supplies concrete evidence that Rust can serve as a competitive, memory-safe alternative for core numerical kernels without major performance penalties in CSC-based workloads. This has potential implications for modernizing scientific software stacks. The manuscript is credited for its direct analysis of Rust-specific mechanisms (monomorphization and FFI) and for framing results as actionable guidance rather than abstract claims.

major comments (2)
  1. [Abstract and benchmark description] Abstract, paragraph on benchmarks: the central performance claim (Rust tracks Eigen/PSBLAS for CSC and trails PETSc blocked CSR) rests on the assertion of 'a suite of representative matrices,' yet no selection criteria, coverage of sparsity structures (block-structured FEM matrices, high-condition-number PDE matrices), or scale diversity are supplied. This omission is load-bearing because the generalization to 'scientific computing applications' cannot be evaluated without it.
  2. [Abstract] Abstract and results presentation: performance outcomes are stated without accompanying data tables, error bars, implementation details on CSC vs. CSR handling, or exclusion criteria for the matrix suite. This prevents verification that the comparison is fair and that post-hoc choices did not affect the reported conclusions.
minor comments (2)
  1. [Abstract] The abstract introduces SpMV, Lanczos, and matrix-exponential without first spelling out the acronyms or briefly defining the workloads for readers outside the immediate subfield.
  2. [Abstract] The phrase 'state-of-the-art for CSC formats' would benefit from explicit version numbers or commit hashes for the baseline libraries to allow exact reproduction.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful review and specific suggestions for improving the clarity of our benchmark description and results presentation. We address each major comment below and commit to revisions that will make the matrix suite selection and performance data more transparent and verifiable.

read point-by-point responses
  1. Referee: [Abstract and benchmark description] Abstract, paragraph on benchmarks: the central performance claim (Rust tracks Eigen/PSBLAS for CSC and trails PETSc blocked CSR) rests on the assertion of 'a suite of representative matrices,' yet no selection criteria, coverage of sparsity structures (block-structured FEM matrices, high-condition-number PDE matrices), or scale diversity are supplied. This omission is load-bearing because the generalization to 'scientific computing applications' cannot be evaluated without it.

    Authors: We agree that explicit documentation of matrix selection criteria is necessary to support generalization claims. In the revised manuscript we will add a new subsection (likely in Section 3 or 4) that details the selection process, including coverage of block-structured FEM matrices, high-condition-number PDE matrices, sparsity pattern diversity, matrix scale range, and any exclusion rules applied. This addition will directly address the load-bearing nature of the claim. revision: yes

  2. Referee: [Abstract] Abstract and results presentation: performance outcomes are stated without accompanying data tables, error bars, implementation details on CSC vs. CSR handling, or exclusion criteria for the matrix suite. This prevents verification that the comparison is fair and that post-hoc choices did not affect the reported conclusions.

    Authors: The abstract is a concise summary and cannot contain full tables or error bars. The full manuscript already presents performance tables, repeated-run statistics (error bars), CSC/CSR implementation differences, and matrix handling details in the Results section. To improve verifiability we will (1) revise the abstract to explicitly reference the Results section for these data and (2) expand the Results section with a dedicated paragraph on exclusion criteria and fairness safeguards if the current text is insufficiently explicit. We cannot embed tabular data in the abstract itself. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical benchmarks with no derivations or fitted predictions

full rationale

The paper reports measured runtime and performance numbers from direct comparisons of Rust sparse kernels against Eigen, PETSc, PSBLAS, and oneMKL on a fixed matrix suite for SpMV, Lanczos, and matrix-exponential workloads. No equations, first-principles derivations, parameter fits, or predictions appear; the central claim is simply that the observed timings are comparable or trailing. The representativeness of the matrix suite is an external assumption about coverage, not a self-referential definition or reduction of any result to its own inputs. No self-citation chains, uniqueness theorems, or ansatzes are invoked to support the performance statements. The study is therefore self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are invoked; the work is an empirical performance study whose central claim rests on the representativeness of the test matrices and workloads.

pith-pipeline@v0.9.1-grok · 5689 in / 1116 out tokens · 26732 ms · 2026-06-26T18:30:59.787842+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

38 extracted references · 24 canonical work pages

  1. [2]

    https://arxiv.org/ abs/2411.13259

    URL https://arxiv.org/abs/2411.13259. https://arxiv.org/ abs/2411.13259

  2. [3]

    H. Anzt, E. Boman, R. Falgout, P. Ghysels, M. Heroux, X. Li, L.CurfmanMcInnes,R.TranMills,S.Rajamanickam,K.Rupp, B. Smith, I. Yamazaki, and U. Meier Yang. Preparing sparse solvers for exascale computing.Philosophical Transactions of theRoyalSocietyA:Mathematical,PhysicalandEngineeringSci- ences, 378(2166):20190053, 01 2020. ISSN 1364-503X. doi: 10.1098/rs...

  3. [4]

    H. Anzt, T. Cojean, Y.-C. Chen, G. Flegar, F. Göbel, T. Grütz- macher, P. Nayak, T. Ribizel, and Y.-H. Tsai. Ginkgo: A high performance numerical linear algebra library.Journal of Open Source Software, 5(52):2260, 2020. doi: 10.21105/joss.02260. URLhttps://doi.org/10.21105/joss.02260

  4. [5]

    Arndt, W

    D. Arndt, W. Bangerth, M. Bergbauer, and et al. Thedeal.ii library, version 9.7.J.Numer.Math., 33(4):403–415, 2025. ISSN 1570-2820,1569-3953. doi:10.1515/jnma-2025-0115. URLhttps: //doi.org/10.1515/jnma-2025-0115

  5. [6]

    Balay, S

    S. Balay, S. Abhyankar, M. Adams, J. Brown, P. Brune, K. Buschelman, L. Dalcin, A. Dener, V. Eijkhout, W. Gropp, D. Karpeyev, D. Kaushik, M. Knepley, D. May, L. McInnes, R. Mills, T. Munson, K. Rupp, P. Sanan, and H. Zhang. PETSc Users Manual. Technical report, Argonne National Laboratory, 2019

  6. [7]

    V. A. Barker, L. S. Blackford, J. Dongarra, J. Du Croz, S. Ham- marling, M. Marinova, J. Waśniewski, and P. Yalamov.LA- PACK95 users’ guide, volume 13 ofSoftware, Environments, and Tools. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 2001. ISBN 0-89871-504-0. doi: 10.1137/1. 9780898718201. URLhttps://doi.org/10.1137/1.9780898718201

  7. [8]

    Benzi and P

    M. Benzi and P. Boito. Matrix functions in network analysis. GAMM-Mitt.,43(3):e202000012,36,2020. ISSN0936-7195,1522-

  8. [9]

    URL https://doi.org/10

    doi: 10.1002/gamm.202000012. URL https://doi.org/10. 1002/gamm.202000012

  9. [10]

    A. Bílý, J. Pereira, and P. Müller. A refinement methodology for distributed programs in rust.Proc. ACM Program. Lang., 9 (OOPSLA2), Oct. 2025. doi: 10.1145/3763119. URL https://doi. org/10.1145/3763119

  10. [11]

    M. Bitar. Rust and julia for scientific computing.Computing inScience&Engineering,26(1):72–76,2024. doi:10.1109/MCSE. 2024.3369988

  11. [12]

    GraphBLASparaRust

    R.Broketa,H.Brumatto,andV.Silva. GraphBLASparaRust. In Anais da XXV Escola Regional de Computação Bahia, Alagoas e Sergipe,pages172–181,PortoAlegre,RS,Brasil,2025.SBC. doi: 10.5753/erbase.2025.13668. URL https://sol.sbc.org.br/index. php/erbase/article/view/39301

  12. [13]

    Cardellini, S

    V. Cardellini, S. Filippone, and D.W.I. Rouson. Design Pat- terns for Sparse-Matrix Computations on Hybrid CPU/GPU Platforms.Scientific Programming, 22(1):469753, 2014. doi: https://doi.org/10.3233/SPR-130363. URLhttps://onlinelibrary. wiley.com/doi/abs/10.3233/SPR-130363

  13. [14]

    Errorbounds for Lanczos-based matrix function approximation.SIAM J

    T.Chen,A.Greenbaum,C.Musco,andC.Musco. Errorbounds for Lanczos-based matrix function approximation.SIAM J. Matrix Anal. Appl., 43(2):787–811, 2022. ISSN 0895-4798,1095-

  14. [15]

    URL https://doi.org/10.1137/ 21M1427784

    doi: 10.1137/21M1427784. URL https://doi.org/10.1137/ 21M1427784

  15. [16]

    T. A. Davis and Y. Hu. The university of florida sparse matrix collection.ACM Trans. Math. Softw., 38(1), Dec. 2011. ISSN 0098-3500. doi:10.1145/2049662.2049663. URLhttps://doi.org/ 10.1145/2049662.2049663

  16. [17]

    Benchmarkingoptimizationsoftware withperformanceprofiles.Math.Program.,91(2):201–213,2002

    E.D.DolanandJ.J.Moré. Benchmarkingoptimizationsoftware withperformanceprofiles.Math.Program.,91(2):201–213,2002. ISSN 0025-5610,1436-4646. doi: 10.1007/s101070100263. URL https://doi.org/10.1007/s101070100263

  17. [18]

    I. S. Duff, M. A. Heroux, and R. Pozo. An overview of the sparsebasiclinearalgebrasubprograms:thenewstandardfrom the BLAS Technical Forum.ACM Trans. Math. Software, 28(2): 239–267,2002. ISSN0098-3500,1557-7295. doi:10.1145/567806. 567810. URLhttps://doi.org/10.1145/567806.567810

  18. [19]

    I.S. Duff. A survey of sparse matrix research.Proceedingsofthe IEEE,65(4):500–535,1977. doi:10.1109/PROC.1977.10514

  19. [20]

    D’Ambra, F

    P. D’Ambra, F. Durastante, and S. Filippone. Parallel Sparse Computation Toolkit.Software Impacts, 15:100463, 2023. ISSN 2665-9638. doi: https://doi.org/10.1016/j.simpa.2022. 13of14 100463. URL https://www.sciencedirect.com/science/article/ pii/S2665963822001476

  20. [21]

    PSBLAS:alibraryforparallellin- ear algebra computation on sparse matrices.ACMTrans.Math

    S.FilipponeandM.Colajanni. PSBLAS:alibraryforparallellin- ear algebra computation on sparse matrices.ACMTrans.Math. Softw.,26(4):527–550,Dec.2000. ISSN0098-3500. doi:10.1145/ 365723.365732. URLhttps://doi.org/10.1145/365723.365732

  21. [22]

    Sparse Matrix-Vector Multiplication on GPGPUs.ACM Trans

    S.Filippone,V.Cardellini,D.Barbieri,andA.Fanfarillo. Sparse Matrix-Vector Multiplication on GPGPUs.ACM Trans. Math. Softw., 43(4), Jan. 2017. ISSN 0098-3500. doi: 10.1145/3017994. URLhttps://doi.org/10.1145/3017994

  22. [23]

    D Friese, R

    R. D Friese, R. Gioiosa, J. Cottam, E. Multu, G. Roek, P. Thomadakis, and M. Raugas. Lamellar: A Rust-based Asyn- chronous Tasking and PGAS Runtime for High Performance Computing. InSC24-W: Workshops of the International Confer- ence for High Performance Computing, Networking, Storage and Analysis,pages1236–1251.IEEE,2024

  23. [24]

    N. J. Higham.Functions of matrices. Society for Industrial and AppliedMathematics(SIAM),Philadelphia,PA,2008.ISBN978- 0-89871-646-7. doi: 10.1137/1.9780898717778. URL https://doi. org/10.1137/1.9780898717778. Theoryandcomputation

  24. [25]

    EnhancingTypeSafetyinMPIwithRust: AStaticallyVerifiedApproachforRSMPI

    N.IqbalandJ.Brown. EnhancingTypeSafetyinMPIwithRust: AStaticallyVerifiedApproachforRSMPI. InWorkshoponAsyn- chronous Many-Task Systems and Applications, pages 133–139. Springer,2025

  25. [26]

    Basic linearalgebrasubprogramsforfortranusage.ACMTrans.Math

    C.L.Lawson,R.J.Hanson,D.R.Kincaid,andF.T.Krogh. Basic linearalgebrasubprogramsforfortranusage.ACMTrans.Math. Softw., 5(3):308–323, Sept. 1979. ISSN 0098-3500. doi: 10.1145/ 355841.355847. URLhttps://doi.org/10.1145/355841.355847

  26. [27]

    Martinelli and G

    M. Martinelli and G. Manzini. A Functional Tensor Train Library in RUST for Numerical Integration and Resolution of Partial Differential Equations. In Ivan Lirkov and Svetozar Margenov, editors,Large-Scale Scientific Computations, pages 223–233, Cham, 2024. Springer Nature Switzerland. ISBN 978- 3-031-56208-2

  27. [28]

    N. D. Matsakis and F. S. Klock. The rust language. InProceed- ingsofthe2014ACMSIGAdaannualconferenceonHighintegrity languagetechnology,pages103–104,2014

  28. [29]

    cuda-oxide: A customrustcbackend for compiling GPU kernels in pure Rust, 2026

    NVIDIA NVLabs. cuda-oxide: A customrustcbackend for compiling GPU kernels in pure Rust, 2026. Available at https: //github.com/NVlabs/cuda-oxide,accessedMay20,2026

  29. [30]

    Quiñones El Kazdadi

    S. Quiñones El Kazdadi. faer: A general-purpose linear algebra library for Rust. Docs.rs documentation, 2026. https://docs.rs/ faer/latest/faer/index.html

  30. [31]

    diffsol: Rust crate for solv- ing differential equations.Journal of Open Source Software, 11 (117):9384,2026

    Martin Robinson and Alex Allmont. diffsol: Rust crate for solv- ing differential equations.Journal of Open Source Software, 11 (117):9384,2026. doi:10.21105/joss.09384. URLhttps://doi.org/ 10.21105/joss.09384

  31. [32]

    rsmpi: MPI bindings for Rust, 2025

    rsmpi. rsmpi: MPI bindings for Rust, 2025. Version 0.8.1. Available at https://github.com/rsmpi/rsmpi, accessed May 20, 2026

  32. [33]

    Rust CUDA: GPU code fully in Rust, 2025

    Rust-CUDA. Rust CUDA: GPU code fully in Rust, 2025. Available at https://rust-gpu.github.io/blog/2025/08/11/ rust-cuda-update/,accessedMay20,2026

  33. [34]

    Availableathttps: //github.com/Rust-GPU/rust-gpu,accessedMay20,2026

    Rust-GPU.rust-gpu:Rustasafirst-classlanguageandecosystem forGPUgraphicsandcomputeshaders,2026. Availableathttps: //github.com/Rust-GPU/rust-gpu,accessedMay20,2026

  34. [35]

    Y. Saad. Analysis of some Krylov subspace approximations to the matrix exponential operator.SIAM J. Numer. Anal., 29(1): 209–228, 1992. ISSN 0036-1429. doi: 10.1137/0729014. URL https://doi.org/10.1137/0729014

  35. [36]

    SocietyforIn- dustrialandAppliedMathematics,Philadelphia,PA,secondedi- tion, 2003

    Y.Saad.Iterativemethodsforsparselinearsystems. SocietyforIn- dustrialandAppliedMathematics,Philadelphia,PA,secondedi- tion, 2003. ISBN 0-89871-534-2. doi: 10.1137/1.9780898718003. URLhttps://doi.org/10.1137/1.9780898718003

  36. [37]

    Saad.Numerical methods for large eigenvalue problems, volume 66 ofClassics in Applied Mathematics

    Y. Saad.Numerical methods for large eigenvalue problems, volume 66 ofClassics in Applied Mathematics. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, revised edition, 2011. ISBN 978-1-611970-72-2. doi: 10.1137/1.9781611970739.ch1. URL https://doi.org/10.1137/1. 9781611970739.ch1

  37. [38]

    R. B. Sidje. Expokit: a software package for computing ma- trixexponentials.ACMTrans.Math.Softw.,24(1):130–156,Mar

  38. [39]

    doi: 10.1145/285861.285868

    ISSN 0098-3500. doi: 10.1145/285861.285868. URL https: //doi.org/10.1145/285861.285868. SupportingInformation ThecodeforrunningthebenchmarkisavailablefromtheGitHub repositorylukefleed/hpla-rs. 14of14 arXiv,2024