Streami: An MPI Data-Parallel Library to Compute Field Lines on GPUs

Andrea Paris; Ingo Wald; Milan Jaros; Stefan Zellmann; Tatiana von Landesberger

arxiv: 2606.02627 · v1 · pith:OHVO2TWFnew · submitted 2026-05-29 · 💻 cs.CE · cs.DC· cs.GR· physics.flu-dyn

Streami: An MPI Data-Parallel Library to Compute Field Lines on GPUs

Stefan Zellmann , Milan Jaros , Andrea Paris , Ingo Wald , Tatiana von Landesberger This is my paper

Pith reviewed 2026-06-28 19:26 UTC · model grok-4.3

classification 💻 cs.CE cs.DCcs.GRphysics.flu-dyn

keywords field line computationGPU accelerationMPI parallelismfluid flowsin-situ analysishigh-performance computingdata-parallel librarypost-hoc visualization

0 comments

The pith

Streami is a thin GPU-accelerated MPI library for computing field lines in fluid flows that integrates with existing applications.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Streami as an extensible library that performs field line computations on GPUs for fluid flow data on high-performance systems. It functions as a thin layer usable for either post-hoc analysis after a simulation or in-situ analysis during one, and it connects directly to existing MPI codes without requiring major rewrites. The authors describe the library's API along with the design choices intended to deliver both speed and the ability to handle different representations of flow fields. They also include a sample application that supports quick prototyping and interactive placement of seed points. The library is made available under an open-source license.

Core claim

Streami acts as a thin, extensible layer for field line computation in fluid flows, leveraging GPU acceleration and MPI parallelism to enable efficient post-hoc or in-situ analysis that integrates directly with existing high-performance computing applications.

What carries the argument

The Streami library API and its design decisions that target high performance and extensibility while supporting varied fluid flow field representations.

If this is right

Existing MPI fluid simulations can add field line computation as a post-processing or in-situ step without large code modifications.
The same library code supports both post-hoc and in-situ analysis modes.
Extensions allow the library to work with multiple different representations of the underlying flow fields.
A provided sample application demonstrates interactive seed point placement for rapid prototyping of visualizations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The design could shorten the wall-clock time between running a large fluid simulation and obtaining its field line visualizations on the same machine.
Similar thin-layer GPU libraries might be built for other analysis primitives that are currently done only after data is moved off the supercomputer.
Interactive seed placement could be extended to support live steering of ongoing simulations if the in-situ path is used.

Load-bearing premise

The library's specific design decisions produce both high performance and extensibility that allow it to interface with existing MPI applications and accommodate different flow field representations.

What would settle it

A performance test on representative fluid flow data showing no meaningful GPU speedup over a standard CPU implementation, or an integration test where Streami cannot be called from a typical MPI fluid simulation without substantial code changes, would falsify the central claim.

Figures

Figures reproduced from arXiv: 2606.02627 by Andrea Paris, Ingo Wald, Milan Jaros, Stefan Zellmann, Tatiana von Landesberger.

**Figure 2.** Figure 2: Class diagram for the distributed vector field abstrac [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: Turbulent flow structured field advection, data distributed [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 6.** Figure 6: Performance on up to 16× A100 GPUs (two nodes with eight GPUs), for the galaxy data set [20] (1K 3 voxels). We report timings for tracing 100K particles, averaged over 10K advection steps. (Unstructured: out-of-memory for 1, 2, and 4 GPUs.) 4 SAMPLE APPS We implemented two sample apps that use Streami: a command line app that writes its output to files, and an interactive app for exploration (see [PITH_F… view at source ↗

**Figure 5.** Figure 5: Sample apps using Streami. Left: streamlines generated with the command line app, for the astrophysics data set from [20], and visualized with PyVista. Right: Interactive sample app using Streami with interactive seed point placement and volume rendering overlay of the turbulence field, using the wind farm data set from [23]. base class, while on the device we resort to compile time polymorphism for perfo… view at source ↗

read the original abstract

We present Streami, an extensible GPU-accelerated library for the computation of field lines in fluid flows on high-performance computers. Streami acts as a thin layer used for both post-hoc or in-situ analysis and can interface with existing MPI applications. We discuss Streami's application programming interface, key design decisions that led to Streami's high performance and extensibility, as well as extensions to support different fluid flow field representations. We also present a sample application for rapid prototyping and interactive seed point placement. Streami is released under a permissive open-source software license.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Streami is a new open-source library for GPU field-line tracing in MPI fluid simulations, but the paper is mostly a description without benchmarks or comparisons.

read the letter

Streami is presented as a thin MPI layer for computing field lines on GPUs, usable for post-hoc or in-situ analysis in fluid flows. The paper covers the API, design choices aimed at performance and extensibility, support for different field representations, and a sample app for seed placement. It is released under a permissive license.

The work is new in the sense that it packages these capabilities into one library targeted at high-performance GPU setups and existing MPI codes. That could save time for people who need field-line tracing without building it from scratch.

The main limitation is the lack of any performance numbers, scaling results, or direct comparisons to prior tools. The abstract asserts high performance from the design decisions, but without data it is difficult to assess whether those decisions actually deliver. If the full paper includes benchmarks or reproducible tests, that would change the picture.

This is for computational fluid dynamics researchers who run on GPU clusters and want a ready library rather than a new algorithm or theorem. A reader looking for implementation details and an open-source starting point could find it useful.

It deserves peer review to check the code quality, the actual performance claims, and how well it integrates with real applications. The contribution is modest but practical, so a referee could help clarify its scope.

Referee Report

1 major / 1 minor

Summary. The manuscript presents Streami, an extensible GPU-accelerated MPI library for computing field lines in fluid flows. It positions the library as a thin layer for post-hoc or in-situ analysis that interfaces with existing MPI applications, describes the API and key design decisions for performance and extensibility, details extensions supporting varied fluid flow field representations, and includes a sample application for rapid prototyping and interactive seed placement. The software is released under a permissive open-source license.

Significance. If the implementation and design choices deliver the claimed performance and extensibility, Streami could serve as a practical tool for integrating field-line tracing into HPC fluid-dynamics workflows, reducing the barrier to GPU-accelerated in-situ analysis within existing MPI codes.

major comments (1)

[Abstract] Abstract: The central claims that 'key design decisions led to Streami's high performance and extensibility' and that the library 'can interface with existing MPI applications' are presented without any supporting benchmarks, timing data, scaling results, or implementation metrics. This absence is load-bearing for a software-contribution paper whose value rests on those performance and integration properties.

minor comments (1)

The manuscript would benefit from at least one concrete code example or API call sequence illustrating how an existing MPI application would invoke Streami for in-situ tracing.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their review and positive assessment of Streami's potential utility in HPC workflows. We address the single major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: The central claims that 'key design decisions led to Streami's high performance and extensibility' and that the library 'can interface with existing MPI applications' are presented without any supporting benchmarks, timing data, scaling results, or implementation metrics. This absence is load-bearing for a software-contribution paper whose value rests on those performance and integration properties.

Authors: We agree that the abstract asserts performance and extensibility benefits, as well as MPI interoperability, without accompanying quantitative evidence. The body of the manuscript focuses on the API, design rationale, and extensibility mechanisms rather than empirical results. To address this, we will revise the abstract to remove unsubstantiated performance claims and add a new results section containing benchmarks, timing measurements, weak/strong scaling data, and concrete examples of integration with existing MPI codes. These additions will directly support the design decisions described. revision: yes

Circularity Check

0 steps flagged

No significant circularity; software library description

full rationale

The paper is a descriptive contribution presenting a GPU-accelerated MPI library for field line computation. It contains no equations, derivations, predictions, fitted parameters, or load-bearing self-citations. The central claims concern API design, performance choices, extensibility, and a sample application, all of which are independent of any internal reduction to inputs. This is a standard honest finding for software artifact papers.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper is a software library presentation with no mathematical model, free parameters, axioms, or invented entities.

pith-pipeline@v0.9.1-grok · 5634 in / 996 out tokens · 31061 ms · 2026-06-28T19:26:45.865331+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

22 extracted references · 11 canonical work pages

[1]

Ahrens, B

J. Ahrens, B. Geveci, and C. Law. ParaView: An End-User Tool for Large Data Visualization. InVisualization Handbook, pp. 717–731. Elsevier, 2005. 2

2005
[2]

Burkhart, S

B. Burkhart, S. Appel, S. Bialy, J. Cho, A. Christensen, D. Collins, C. Federrath, D. Fielding, D. Finkbeiner, A. Hill, et al. The cata- logue for astrophysical turbulence simulations (cats).The Astrophysi- cal Journal, 905(1):14, 2020. 2

2020
[3]

Childs, E

H. Childs, E. Brugger, B. Whitlock, J. Meredith, S. Ahern, K. Bon- nell, M. Miller, G. H. Weber, C. Harrison, D. Pugmire, T. Fogal, C. Garth, A. Sanderson, E. W. Bethel, M. Durant, D. Camp, J. M. Favre, O. R ¨ubel, P. Navr´atil, M. Wheeler, P. Selby, and F. Vivodtzev. VisIt: An End-User Tool For Visualizing and Analyzing Very Large Data. InProceedings of ...

2011
[5]

Catalyst2: GPU resident workflows, 2024.https://www

Kitware. Catalyst2: GPU resident workflows, 2024.https://www. kitware.com/catalyst2-gpu-resident-workflows/. 1

2024
[6]

Larsen, E

M. Larsen, E. Brugger, H. Childs, and C. Harrison. Ascent: A fly- weight in situ library for exascale simulations. In H. Childs, J. C. Bennett, and C. Garth, eds.,In Situ Visualization for Computational Science, pp. 255–279. Springer International Publishing, Cham, 2022. 1

2022
[7]

S. F. Matringe, R. Juanes, and H. A. Tchelepi. Robust streamline trac- ing for the simulation of porous media flow on general triangular and quadrilateral grids.Journal of Computational Physics, 219(2):992– 1012, 2006. doi: 10.1016/j.jcp.2006.07.004 2

work page doi:10.1016/j.jcp.2006.07.004 2006
[8]

Moreland, C

K. Moreland, C. Sewell, W. Usher, L.-t. Lo, J. Meredith, D. Pugmire, J. Kress, H. Schroots, K.-L. Ma, H. Childs, M. Larsen, C.-M. Chen, R. Maynard, and B. Geveci. VTK-m: Accelerating the Visualiza- tion Toolkit for Massively Threaded Architectures.IEEE Computer Graphics and Applications, 36(3), 2016. doi: 10.1109/MCG.2016.48 1

work page doi:10.1109/mcg.2016.48 2016
[9]

Nouanesengsy, T.-Y

B. Nouanesengsy, T.-Y . Lee, and H.-W. Shen. Load-balanced parallel streamline generation on large scale vector fields.IEEE Transactions on Visualization and Computer Graphics, 17(12):1785–1794, 2011. doi: 10.1109/TVCG.2011.219 2

work page doi:10.1109/tvcg.2011.219 2011
[10]

Available at https://github.com/NVIDIA/cuBQL, Accessed: Feb 26, 2026

cuBQL - The CUDA BVH Build and Query Library. Available at https://github.com/NVIDIA/cuBQL, Accessed: Feb 26, 2026. 4

2026
[11]

Ohana, M

R. Ohana, M. McCabe, L. Meyer, R. Morel, F. J. Agocs, M. Beneitez, M. Berger, B. Burkhart, S. B. Dalziel, D. B. Fielding, et al. The well: a large-scale collection of diverse physics simulations for ma- chine learning.Advances in Neural Information Processing Systems, 37:44989–45037, 2024. 2

2024
[12]

Peterka, R

T. Peterka, R. Ross, B. Nouanesengsy, T.-Y . Lee, H.-W. Shen, W. Kendall, and J. Huang. A study of parallel particle tracing for steady-state and time-varying flow fields. In2011 IEEE International Parallel & Distributed Processing Symposium, pp. 580–591, 2011. doi: 10.1109/IPDPS.2011.62 2

work page doi:10.1109/ipdps.2011.62 2011
[13]

Pugmire, H

D. Pugmire, H. Childs, C. Garth, S. Ahern, and G. H. Weber. Scalable computation of streamlines on very large datasets. InProc. Supercom- puting SC09. Portland, OR, USA, Nov. 2009. LBNL-3264E. 2

2009
[14]

Pugmire, A

D. Pugmire, A. Yenpure, M. Kim, J. Kress, R. Maynard, H. Childs, and B. Hentschel. Performance-Portable Particle Advection with VTK-m. In H. Childs and F. Cucchietti, eds.,Eurographics Sympo- sium on Parallel Graphics and Visualization. The Eurographics Asso- ciation, 2018. doi: 10.2312/pgv.20181094 2

work page doi:10.2312/pgv.20181094 2018
[15]

Schroeder, K

W. Schroeder, K. Martin, and B. Lorensen.The Visualization Toolkit (4th ed.). Kitware, 2006. 1

2006
[16]

I. Wald. rafi - the RAy Forwarding Infrastructure Library. Available at https://github.com/ingowald/rafi, Accessed: Apr 29, 2026. 3

2026
[17]

I. Wald, G. Johnson, J. Amstutz, C. Brownlee, A. Knoll, J. Jeffers, J. G¨unther, and P. Navratil. OSPRay - A CPU ray tracing framework for scientific visualization.IEEE Transactions on Visualization and Computer Graphics, 23(1):931–940, 2017. 1

2017
[18]

I. Wald, S. Zellmann, J. Amstutz, Q. Wu, K. Griffin, M. Jaros, and S. Wesner. Standardized Data-Parallel Rendering Using ANARI. In 2024 IEEE 14th Symposium on Large Data Analysis and Visualization (LDAV), pp. 23–32, 2024. doi: 10.1109/LDA V64567.2024.00013 1

work page doi:10.1109/lda 2024
[19]

Z. Wang, K. Moreland, M. Larsen, J. Kress, H. Childs, and D. Pug- mire. Parallelize Over Data Particle Advection: Participation, Ping Pong Particles, and Overhead.IEEE Transactions on Visualization and Computer Graphics, 31(10):7795–7808, 2025. doi: 10.1109/ TVCG.2025.3557453 2

work page arXiv 2025
[20]

Wissing and S

R. Wissing and S. Shen. Numerical dependencies of the galactic dynamo in isolated galaxies with SPH.Astronomy & Astrophysics, 673:A47, May 2023. doi: 10.1051/0004-6361/202244753 4

work page doi:10.1051/0004-6361/202244753 2023
[21]

doi: 10.1111/cgf

A. Yenpure, S. Sane, R. Binyahib, D. Pugmire, C. Garth, and H. Childs. State-of-the-Art Report on Optimizing Particle Advection Performance.Computer Graphics Forum, 2023. doi: 10.1111/cgf. 14858 2

work page doi:10.1111/cgf 2023
[22]

Zellmann, D

S. Zellmann, D. Seifried, N. Morrical, I. Wald, W. Usher, J. A. P. Law-Smith, S. Walch-Gassner, and A. Hinkenjann. Point Contain- ment Queries on Ray-Tracing Cores for AMR Flow Visualization. Computing in Science & Engineering, 24(2):40–51, 2022. doi: 10. 1109/MCSE.2022.3153677 2

work page arXiv 2022
[23]

X. Zhu, S. Xiao, G. Narasimhan, L. A. Martinez-Tossas, M. Schnaubelt, G. Lemson, H. Yao, A. S. Szalay, D. F. Gayme, and C. Meneveau. JHTDB-wind: a web-accessible large-eddy simulation database of a wind farm with virtual sensor querying.Wind Energy Science, 10(12):2821–2840, 2025. doi: 10.5194/wes-10-2821-2025 4 5

work page doi:10.5194/wes-10-2821-2025 2025

[1] [1]

Ahrens, B

J. Ahrens, B. Geveci, and C. Law. ParaView: An End-User Tool for Large Data Visualization. InVisualization Handbook, pp. 717–731. Elsevier, 2005. 2

2005

[2] [2]

Burkhart, S

B. Burkhart, S. Appel, S. Bialy, J. Cho, A. Christensen, D. Collins, C. Federrath, D. Fielding, D. Finkbeiner, A. Hill, et al. The cata- logue for astrophysical turbulence simulations (cats).The Astrophysi- cal Journal, 905(1):14, 2020. 2

2020

[3] [3]

Childs, E

H. Childs, E. Brugger, B. Whitlock, J. Meredith, S. Ahern, K. Bon- nell, M. Miller, G. H. Weber, C. Harrison, D. Pugmire, T. Fogal, C. Garth, A. Sanderson, E. W. Bethel, M. Durant, D. Camp, J. M. Favre, O. R ¨ubel, P. Navr´atil, M. Wheeler, P. Selby, and F. Vivodtzev. VisIt: An End-User Tool For Visualizing and Analyzing Very Large Data. InProceedings of ...

2011

[4] [5]

Catalyst2: GPU resident workflows, 2024.https://www

Kitware. Catalyst2: GPU resident workflows, 2024.https://www. kitware.com/catalyst2-gpu-resident-workflows/. 1

2024

[5] [6]

Larsen, E

M. Larsen, E. Brugger, H. Childs, and C. Harrison. Ascent: A fly- weight in situ library for exascale simulations. In H. Childs, J. C. Bennett, and C. Garth, eds.,In Situ Visualization for Computational Science, pp. 255–279. Springer International Publishing, Cham, 2022. 1

2022

[6] [7]

S. F. Matringe, R. Juanes, and H. A. Tchelepi. Robust streamline trac- ing for the simulation of porous media flow on general triangular and quadrilateral grids.Journal of Computational Physics, 219(2):992– 1012, 2006. doi: 10.1016/j.jcp.2006.07.004 2

work page doi:10.1016/j.jcp.2006.07.004 2006

[7] [8]

Moreland, C

K. Moreland, C. Sewell, W. Usher, L.-t. Lo, J. Meredith, D. Pugmire, J. Kress, H. Schroots, K.-L. Ma, H. Childs, M. Larsen, C.-M. Chen, R. Maynard, and B. Geveci. VTK-m: Accelerating the Visualiza- tion Toolkit for Massively Threaded Architectures.IEEE Computer Graphics and Applications, 36(3), 2016. doi: 10.1109/MCG.2016.48 1

work page doi:10.1109/mcg.2016.48 2016

[8] [9]

Nouanesengsy, T.-Y

B. Nouanesengsy, T.-Y . Lee, and H.-W. Shen. Load-balanced parallel streamline generation on large scale vector fields.IEEE Transactions on Visualization and Computer Graphics, 17(12):1785–1794, 2011. doi: 10.1109/TVCG.2011.219 2

work page doi:10.1109/tvcg.2011.219 2011

[9] [10]

Available at https://github.com/NVIDIA/cuBQL, Accessed: Feb 26, 2026

cuBQL - The CUDA BVH Build and Query Library. Available at https://github.com/NVIDIA/cuBQL, Accessed: Feb 26, 2026. 4

2026

[10] [11]

Ohana, M

R. Ohana, M. McCabe, L. Meyer, R. Morel, F. J. Agocs, M. Beneitez, M. Berger, B. Burkhart, S. B. Dalziel, D. B. Fielding, et al. The well: a large-scale collection of diverse physics simulations for ma- chine learning.Advances in Neural Information Processing Systems, 37:44989–45037, 2024. 2

2024

[11] [12]

Peterka, R

T. Peterka, R. Ross, B. Nouanesengsy, T.-Y . Lee, H.-W. Shen, W. Kendall, and J. Huang. A study of parallel particle tracing for steady-state and time-varying flow fields. In2011 IEEE International Parallel & Distributed Processing Symposium, pp. 580–591, 2011. doi: 10.1109/IPDPS.2011.62 2

work page doi:10.1109/ipdps.2011.62 2011

[12] [13]

Pugmire, H

D. Pugmire, H. Childs, C. Garth, S. Ahern, and G. H. Weber. Scalable computation of streamlines on very large datasets. InProc. Supercom- puting SC09. Portland, OR, USA, Nov. 2009. LBNL-3264E. 2

2009

[13] [14]

Pugmire, A

D. Pugmire, A. Yenpure, M. Kim, J. Kress, R. Maynard, H. Childs, and B. Hentschel. Performance-Portable Particle Advection with VTK-m. In H. Childs and F. Cucchietti, eds.,Eurographics Sympo- sium on Parallel Graphics and Visualization. The Eurographics Asso- ciation, 2018. doi: 10.2312/pgv.20181094 2

work page doi:10.2312/pgv.20181094 2018

[14] [15]

Schroeder, K

W. Schroeder, K. Martin, and B. Lorensen.The Visualization Toolkit (4th ed.). Kitware, 2006. 1

2006

[15] [16]

I. Wald. rafi - the RAy Forwarding Infrastructure Library. Available at https://github.com/ingowald/rafi, Accessed: Apr 29, 2026. 3

2026

[16] [17]

I. Wald, G. Johnson, J. Amstutz, C. Brownlee, A. Knoll, J. Jeffers, J. G¨unther, and P. Navratil. OSPRay - A CPU ray tracing framework for scientific visualization.IEEE Transactions on Visualization and Computer Graphics, 23(1):931–940, 2017. 1

2017

[17] [18]

I. Wald, S. Zellmann, J. Amstutz, Q. Wu, K. Griffin, M. Jaros, and S. Wesner. Standardized Data-Parallel Rendering Using ANARI. In 2024 IEEE 14th Symposium on Large Data Analysis and Visualization (LDAV), pp. 23–32, 2024. doi: 10.1109/LDA V64567.2024.00013 1

work page doi:10.1109/lda 2024

[18] [19]

Z. Wang, K. Moreland, M. Larsen, J. Kress, H. Childs, and D. Pug- mire. Parallelize Over Data Particle Advection: Participation, Ping Pong Particles, and Overhead.IEEE Transactions on Visualization and Computer Graphics, 31(10):7795–7808, 2025. doi: 10.1109/ TVCG.2025.3557453 2

work page arXiv 2025

[19] [20]

Wissing and S

R. Wissing and S. Shen. Numerical dependencies of the galactic dynamo in isolated galaxies with SPH.Astronomy & Astrophysics, 673:A47, May 2023. doi: 10.1051/0004-6361/202244753 4

work page doi:10.1051/0004-6361/202244753 2023

[20] [21]

doi: 10.1111/cgf

A. Yenpure, S. Sane, R. Binyahib, D. Pugmire, C. Garth, and H. Childs. State-of-the-Art Report on Optimizing Particle Advection Performance.Computer Graphics Forum, 2023. doi: 10.1111/cgf. 14858 2

work page doi:10.1111/cgf 2023

[21] [22]

Zellmann, D

S. Zellmann, D. Seifried, N. Morrical, I. Wald, W. Usher, J. A. P. Law-Smith, S. Walch-Gassner, and A. Hinkenjann. Point Contain- ment Queries on Ray-Tracing Cores for AMR Flow Visualization. Computing in Science & Engineering, 24(2):40–51, 2022. doi: 10. 1109/MCSE.2022.3153677 2

work page arXiv 2022

[22] [23]

X. Zhu, S. Xiao, G. Narasimhan, L. A. Martinez-Tossas, M. Schnaubelt, G. Lemson, H. Yao, A. S. Szalay, D. F. Gayme, and C. Meneveau. JHTDB-wind: a web-accessible large-eddy simulation database of a wind farm with virtual sensor querying.Wind Energy Science, 10(12):2821–2840, 2025. doi: 10.5194/wes-10-2821-2025 4 5

work page doi:10.5194/wes-10-2821-2025 2025