Concepts in Practice: C++ MPI Bindings for the HPC Ecosystem. From a Standardizable Core to a Composable Interface

Daniel Brommer; Matthias Schimek; Tim Niklas Uhl

arxiv: 2606.09102 · v1 · pith:MZUB3ZL2new · submitted 2026-06-08 · 💻 cs.DC

Concepts in Practice: C++ MPI Bindings for the HPC Ecosystem. From a Standardizable Core to a Composable Interface

Tim Niklas Uhl , Matthias Schimek , Daniel Brommer This is my paper

Pith reviewed 2026-06-27 15:03 UTC · model grok-4.3

classification 💻 cs.DC

keywords C++ MPI bindingsC++20 conceptsMPI wrappersHPC ecosystemKaMPIngKokkosSYCLGPU integration

0 comments

The pith

A core layer of refined C++20 concepts delivers a low-level native C++ MPI interface that works directly with STL containers and extends to GPU libraries via adapters.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes the first concrete layered architecture for modern C++ MPI bindings based on previously proposed design principles using C++20 concepts. At its foundation is a core layer that formalizes MPI data buffers through refined concepts, automatically maps standard C++ constructs, supplies non-intrusive customization points, and provides concept-based procedure wrappers. This produces an extensible low-level interface compatible with STL containers and suitable for standardization. The core then supports a higher-level library with pipe-based syntax and lightweight adapters that allow direct MPI use of Kokkos views, Thrust device vectors, and SYCL buffers. Readers would care because it addresses the long-standing absence of official C++ bindings with a general-purpose, performance-preserving design backed by a reference implementation.

Core claim

The paper presents the first concrete realization of design principles for modern C++ MPI bindings in a layered architecture. At the foundation is a core layer of refined C++20 concepts that formalize the MPI standard's notion of data buffers, enable automatic mapping of standard C++ constructs, supply non-intrusive customization points for third-party types, and supply concept-based wrappers for MPI procedures. The result is a low-level native C++ MPI interface that works directly with STL containers, is highly extensible, and lends itself to standardization. Built on this core is KaMPIng-v2, a library offering convenience and memory-safety with composable pipe-based syntax. The core also s

What carries the argument

The core layer of refined C++20 concepts that formalize MPI's notion of data buffers, map C++ constructs automatically, and supply non-intrusive customization points together with concept-based wrappers for MPI procedures.

If this is right

The core layer produces a low-level native C++ MPI interface that accepts STL containers directly without additional boilerplate.
KaMPIng-v2 supplies memory-safe MPI programming with composable pipe-based syntax inspired by C++ ranges.
Lightweight adapters integrate Kokkos views, Thrust device vectors, and SYCL buffers as first-class participants in MPI calls.
The design remains self-contained for third-party libraries and supports potential standardization through its use of standard C++ mechanisms.
The architecture demonstrates practical viability through a fully functional open-source reference implementation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The non-intrusive customization points could allow existing C++ codebases to adopt the bindings without modifying their own type definitions.
Similar concept-based layering might apply to bindings for other distributed communication standards beyond MPI.
The separation of core and adapters could simplify maintenance when new performance-portability libraries emerge.
Direct integration of GPU containers into MPI could reduce data movement overhead in heterogeneous HPC applications.

Load-bearing premise

Refined C++20 concepts can formalize MPI's notion of data buffers and provide non-intrusive customization points while preserving performance and compatibility with existing MPI implementations.

What would settle it

A test case in which the concept-based wrappers fail to compile or execute correctly with a standard STL container such as std::vector, or where the adapters for SYCL buffers introduce runtime incompatibility with an existing MPI implementation, would falsify the core claim.

Figures

Figures reproduced from arXiv: 2606.09102 by Daniel Brommer, Matthias Schimek, Tim Niklas Uhl.

**Figure 2.** Figure 2: The core buffer protocol. range. For contiguous ranges, mpi::ptr() is defined via std::ranges::data(), sized ranges default mpi::count() to std::ranges::size(), and for ranges whose range_value_t is an MPI builtin type, mpi::type() automatically returns the matching MPI_Datatype. The generalized data buffer concept resolves both shortcomings: the trait level allows any type (GPU containers, MPI_BOTTOM, ar… view at source ↗

**Figure 3.** Figure 3: Uniform use of MPI communicators across native handles, non-owning [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Concept-based implementation of mpi::allgatherv, illustrating the mapping of MPI C calls to the C++ core interface using the dispatch system. It consists of three components: (1) pipe adapters in the style of std::views that attach or override MPI metadata (datatype, count, per-rank counts and displacements), turning arbitrary objects into data buffers or modifying existing ones, and composing freely with… view at source ↗

**Figure 5.** Figure 5: Standard library and KaMPIng-v2 buffer adaptors. Examples (1) uses [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗

**Figure 6.** Figure 6: Non-blocking irecv: a receive buffer is moved into iresult, which prevents access until .wait() returns it. introduced in KaMPIng [23] and formalized in [2]; KaMPIng-v2 realizes it on top of the buffer concept and view pipeline introduced above. We address memory-safety for non-blocking communication through perfect forwarding of buffer arguments combined with a move-only iresult handle. When a buffer is p… view at source ↗

**Figure 7.** Figure 7: Ecosystem adapters: Kokkos (left) and SYCL (right). [PITH_FULL_IMAGE:figures/full_fig_p014_7.png] view at source ↗

read the original abstract

The official C++ MPI bindings were removed from the standard in 2008, leaving a gap that numerous third-party libraries have attempted to fill. However, existing wrappers typically cover only a limited subset of MPI or target specific use cases, falling short of a general-purpose solution. A recent conceptual paper proposed general design principles for modern C++ bindings based on C++20 concepts, without committing to a concrete interface. We present the first concrete realization of these principles in a layered architecture. At the foundation, we define a core layer: refined C++20 concepts formalizing the MPI standard's notion of data buffers, automatic mapping of standard C++ constructs, non-intrusive customization points for third-party types, and concept-based wrappers for MPI procedures. The result is a low-level native C++ MPI interface that works directly with STL containers, is highly extensible, and lends itself to standardization. Built on this core, we present KaMPIng-v2 -- a C++ MPI library offering the convenience and memory-safety of KaMPIng with composable, pipe-based syntax inspired by C++ ranges for efficient, boilerplate-free MPI programming. Finally, we demonstrate the core layer's broad applicability by designing lightweight adapters for GPU and performance-portability libraries, making the HPC ecosystem a first-class citizen in MPI. Kokkos views, Thrust device vectors, and SYCL buffers can be passed directly to MPI procedures, with adapter logic remaining self-contained. All contributions are backed by a fully functional open-source reference implementation, demonstrating the practical viability of the proposed design.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper ships the first working C++20-concept-based MPI core layer plus KaMPIng-v2 and GPU adapters, backed by open-source code.

read the letter

This paper gives the first concrete C++ implementation of the conceptual MPI bindings design from earlier work. They define a core layer with refined concepts for data buffers that integrate with STL containers and allow third-party extensions without intrusion. On top of that they build KaMPIng-v2 with pipe-based syntax, and they add adapters so Kokkos views, Thrust vectors, and SYCL buffers can be used directly with MPI calls.

The work is solid on the design side. The layered architecture separates the low-level interface from the convenience layer, which is a sensible choice for something that might go into a standard. Shipping a working reference implementation is the right move here, and it shows the adapters stay self-contained.

The main limitation is the lack of any performance measurements or benchmarks in the description. We get the claim that it preserves compatibility and performance with existing MPI, but without data it's hard to judge how well the concept wrappers and adapters actually perform in practice. That said, the central idea is instantiated in code rather than left abstract.

This is for HPC developers and library authors who need better C++ interfaces for MPI, especially those dealing with GPU portability. A reader looking for practical modern bindings would find the syntax examples and adapter patterns useful.

It should go to peer review. The implementation gives it enough substance to warrant referee input on the design choices and extensibility.

Referee Report

1 major / 2 minor

Summary. The paper presents the first concrete realization of previously proposed design principles for modern C++ MPI bindings. It introduces a layered architecture whose core layer uses refined C++20 concepts to formalize MPI data buffers, support automatic mapping of STL constructs, and provide non-intrusive customization points; this core underpins KaMPIng-v2 (with pipe-based, ranges-inspired syntax) and lightweight adapters that allow direct use of Kokkos views, Thrust device vectors, and SYCL buffers with MPI procedures. All elements are backed by a fully functional open-source reference implementation.

Significance. If the design and implementation hold, the work supplies a practical, extensible, and potentially standardizable low-level C++ MPI interface that directly addresses the gap left by the 2008 removal of official bindings while integrating the HPC ecosystem (including GPU and performance-portability libraries) as first-class citizens. Explicit strengths include the open-source reference implementation, the working adapters for Kokkos/Thrust/SYCL, and the demonstration that C++20 concepts can serve as non-intrusive customization points without altering existing MPI implementations.

major comments (1)

Abstract: the claim that the core layer 'preserves performance and compatibility' while using refined C++20 concepts for buffer handling is load-bearing for the stated practical viability, yet the manuscript supplies no benchmarks, timing data, or comparisons against existing wrappers or raw MPI; this prevents assessment of whether the concept-based dispatch and adapters incur measurable overhead.

minor comments (2)

The manuscript would benefit from an explicit related-work section (or expanded discussion in the introduction) that systematically contrasts the proposed core against the 'numerous third-party libraries' mentioned, citing their coverage limitations.
Notation for the concept definitions and customization points should be clarified with a small table or diagram early in the core-layer description to aid readers unfamiliar with C++20 concepts.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive review and positive assessment of the work's significance. We address the major comment below.

read point-by-point responses

Referee: [—] Abstract: the claim that the core layer 'preserves performance and compatibility' while using refined C++20 concepts for buffer handling is load-bearing for the stated practical viability, yet the manuscript supplies no benchmarks, timing data, or comparisons against existing wrappers or raw MPI; this prevents assessment of whether the concept-based dispatch and adapters incur measurable overhead.

Authors: We agree that the absence of explicit benchmarks leaves the performance claim unquantified. The core layer is intentionally designed around C++20 concepts to enable zero-overhead static dispatch (resolved entirely at compile time) and thin, non-intrusive adapters that forward directly to the underlying MPI calls without additional copies or runtime indirection. This mirrors the zero-cost abstraction principle used in the STL and ranges library. Nevertheless, to allow readers to verify the claim, we will add a dedicated evaluation section containing microbenchmarks that compare the core layer and the Kokkos/Thrust/SYCL adapters against raw MPI and selected third-party wrappers. The revised manuscript will therefore include timing data and overhead measurements. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper presents a concrete layered C++ design and reference implementation for MPI bindings, building on a prior conceptual paper's principles without committing to an interface. No equations, fitted parameters, or predictions are involved; the work supplies working code, adapters for Kokkos/Thrust/SYCL, and non-intrusive customization points. The derivation chain consists of design choices instantiated directly in the open-source implementation rather than reducing to self-citation, self-definition, or renamed inputs. The cited conceptual paper is treated as external motivation, not a load-bearing uniqueness theorem or ansatz smuggled in. This is a standard non-circular design/implementation paper.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The design rests on the assumption that C++20 concepts are suitable for formalizing MPI operations; no free parameters or invented entities are introduced.

axioms (1)

domain assumption C++20 concepts can be refined to formalize MPI data buffers and procedures while remaining compatible with the MPI standard
The abstract states that the core layer uses refined C++20 concepts to formalize MPI notions.

pith-pipeline@v0.9.1-grok · 5827 in / 1249 out tokens · 19887 ms · 2026-06-27T15:03:24.879567+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

23 extracted references · 7 canonical work pages · 1 internal anchor

[1]

In: IEEE CLUSTER Workshops

Avans, C.N., Ciesko, J., Pearson, C., Suggs, E.D., Olivier, S.L., Skjellum, A.: Performance insights into supporting Kokkos views in the Kokkos Comm MPI library. In: IEEE CLUSTER Workshops. pp. 186–187 (2024). https: //doi.org/10.1109/CLUSTERWorkshops61563.2024.00051, https://github.com/ kokkos/kokkos-comm

work page doi:10.1109/clusterworkshops61563.2024.00051 2024
[2]

In: EuroMPI

Avans, C.N., Correa, A.A., Ghosh, S., Schimek, M., Schuchart, J., Skjellum, A., Suggs, E.D., Uhl, T.N.: Concepts for designing modern C++ interfaces for MPI. In: EuroMPI. pp. 165–183. Lecture Notes in Computer Science, Springer (2025). https://doi.org/10.1007/978-3-032-07194-1_10

work page doi:10.1007/978-3-032-07194-1_10 2025
[3]

Bauke, H.: MPL - a message passing library (2015),https://github.com/rabauke/ mpl

2015
[4]

In: GPU computing gems Jade edition, pp

Bell, N., Hoberock, J.: Thrust: A productivity-oriented library for cuda. In: GPU computing gems Jade edition, pp. 359–371. Elsevier (2012)

2012
[5]

In: IEEE/ACM CCGrid

Beni, M.S., Crisci, L., Cosenza, B.: EMPI: Enhanced Message Passing Interface in Modern C++. In: IEEE/ACM CCGrid. pp. 141–153 (2023).https://doi.org/10. 1109/CCGrid57682.2023.00023

arXiv 2023
[6]

Proceedings of the JuliaCon Conferences1(1), 68 (2021).https://doi

Byrne, S., Wilcox, L.C., Churavy, V.: Mpi.jl: Julia bindings for the message passing interface. Proceedings of the JuliaCon Conferences1(1), 68 (2021).https://doi. org/10.21105/jcon.00068,https://doi.org/10.21105/jcon.00068

work page doi:10.21105/jcon.00068 2021
[7]

Standard Proposal P2996R13, ISO/IEC JTC1/SC22/WG21 (2025), https://www.open-std.org/jtc1/sc22/wg21/docs/ papers/2025/p2996r13.html

Childers, W., Dimov, P., Katz, D., Revzin, B., Sutton, A., Vali, F., Vande- voorde, D.: Reflection for C++26. Standard Proposal P2996R13, ISO/IEC JTC1/SC22/WG21 (2025), https://www.open-std.org/jtc1/sc22/wg21/docs/ papers/2025/p2996r13.html

2025
[8]

Correa, A.A.: B-MPI3 (2018),https://github.com/LLNL/b-mpi3

2018
[9]

CoRRabs/2306.11840(2023)

Demiralp, A.C., Martin, P., Sakic, N., Krüger, M., Gerrits, T.: A C++20 interface for MPI 4.0. CoRRabs/2306.11840(2023)

arXiv 2023
[10]

Journal of Par- allel and Distributed Computing pp

Edwards, H.C., Trott, C.R., Sunderland, D.: Kokkos: Enabling manycore perfor- mance portability through polymorphic memory access patterns. Journal of Par- allel and Distributed Computing pp. 3202–3216 (2014).https://doi.org/https: //doi.org/10.1016/j.jpdc.2014.07.003, domain-Specific Languages and High- Level Frameworks for High-Performance Computing Con...

work page doi:10.1016/j.jpdc.2014.07.003 2014
[11]

In: 2021 Workshop on Exascale MPI (ExaMPI)

Ghosh, S., Alsobrooks, C., Rüfenacht, M., Skjellum, A., Bangalore, P.V., Lumsdaine, A.: Towards modern C++ language support for MPI. In: 2021 Workshop on Exascale MPI (ExaMPI). pp. 27–35. IEEE (2021)

2021
[12]

Gregor, D., Troyer, M.: Boost.MPI (2005–2007),https://www.boost.org/doc/ libs/1_84_0/doc/html/mpi.html, version 1.84

2005
[13]

Message Passing Interface Forum: MPI: A Message-Passing Interface Standard Ver- sion 5.0 (Jun 2025),https://www.mpi-forum.org/docs/mpi-5.0/mpi50-report. pdf

2025
[14]

Standard Proposal P0896R4, ISO/IEC JTC1/SC22/WG21 (Nov 2018),https://www.open-std.org/ jtc1/sc22/wg21/docs/papers/2018/p0896r4.pdf

Niebler, E., Carter, C., Di Bella, C.: The one ranges proposal. Standard Proposal P0896R4, ISO/IEC JTC1/SC22/WG21 (Nov 2018),https://www.open-std.org/ jtc1/sc22/wg21/docs/papers/2018/p0896r4.pdf

2018
[15]

com/NVIDIA/cccl, part of the CUDA Core Compute Libraries (CCCL)

NVIDIA: Thrust: The C++ parallel algorithms library (2025),https://github. com/NVIDIA/cccl, part of the CUDA Core Compute Libraries (CCCL)

2025
[16]

In: Stotzka, R., Schiffers, M., Cotronis, Y

Pellegrini, S., Prodan, R., Fahringer, T.: A lightweight C++ interface to MPI. In: Stotzka, R., Schiffers, M., Cotronis, Y. (eds.) Proc. of the 20th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP). pp. 3–10. IEEE (2012).https://doi.org/10.1109/PDP.2012.42

work page doi:10.1109/pdp.2012.42 2012
[17]

Polukhin, A.: Boost.pfr (2016), https://www.boost.org/doc/libs/1_84_0/doc/ html/boost_pfr.html

2016
[18]

Concurrency and Computation: Practice and Experience13(4), 245–292 (2001).https://doi.org/https://doi.org/10.1002/cpe.556, https:// onlinelibrary.wiley.com/doi/abs/10.1002/cpe.556

Skjellum, A., Wooley, D.G., Lu, Z., Wolf, M., Bangalore, P.V., Lumsdaine, A., Squyres, J.M., McCandless, B.: Object-oriented analysis and design of the message passing interface. Concurrency and Computation: Practice and Experience13(4), 245–292 (2001).https://doi.org/https://doi.org/10.1002/cpe.556, https:// onlinelibrary.wiley.com/doi/abs/10.1002/cpe.556

work page doi:10.1002/cpe.556 2001
[19]

Steinbusch, B., Gaspar, A., Brown, J.: rsmpi - MPI bindings for rust (2015),https: //github.com/rsmpi/rsmpi

2015
[20]

Addison-Wesley (1994)

Stroustrup, B.: The design and evolution of C++. Addison-Wesley (1994)

1994
[21]

github.io/CppCoreGuidelines/CppCoreGuidelines.html

Stroustrup, B., Sutter, H., et al.: C++ core guidelines (2024),https://isocpp. github.io/CppCoreGuidelines/CppCoreGuidelines.html

2024
[22]

Specification Revision 11, The Khronos Group Inc

The Khronos SYCL Working Group: SYCL 2020 specification. Specification Revision 11, The Khronos Group Inc. (2025),https://registry.khronos.org/SYCL/specs/ sycl-2020/html/sycl-2020.html

2020
[23]

Hilfer fractional advection-diffusion equations with power-law initial condition; a Numerical study using variational iteration method

Uhl, T.N., Schimek, M., Hübner, L., Hespe, D., Kurpicz, F., Seemaier, D., Stelz, C., Sanders, P.: KaMPIng: Flexible and (near) zero-overhead C++ bindings for MPI. In: Intl. Conf. for High Performance Computing, Networking, Storage, and Analysis (SC). IEEE (2024).https://doi.org/10.1109/SC41406.2024.00050

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1109/sc41406.2024.00050 2024

[1] [1]

In: IEEE CLUSTER Workshops

Avans, C.N., Ciesko, J., Pearson, C., Suggs, E.D., Olivier, S.L., Skjellum, A.: Performance insights into supporting Kokkos views in the Kokkos Comm MPI library. In: IEEE CLUSTER Workshops. pp. 186–187 (2024). https: //doi.org/10.1109/CLUSTERWorkshops61563.2024.00051, https://github.com/ kokkos/kokkos-comm

work page doi:10.1109/clusterworkshops61563.2024.00051 2024

[2] [2]

In: EuroMPI

Avans, C.N., Correa, A.A., Ghosh, S., Schimek, M., Schuchart, J., Skjellum, A., Suggs, E.D., Uhl, T.N.: Concepts for designing modern C++ interfaces for MPI. In: EuroMPI. pp. 165–183. Lecture Notes in Computer Science, Springer (2025). https://doi.org/10.1007/978-3-032-07194-1_10

work page doi:10.1007/978-3-032-07194-1_10 2025

[3] [3]

Bauke, H.: MPL - a message passing library (2015),https://github.com/rabauke/ mpl

2015

[4] [4]

In: GPU computing gems Jade edition, pp

Bell, N., Hoberock, J.: Thrust: A productivity-oriented library for cuda. In: GPU computing gems Jade edition, pp. 359–371. Elsevier (2012)

2012

[5] [5]

In: IEEE/ACM CCGrid

Beni, M.S., Crisci, L., Cosenza, B.: EMPI: Enhanced Message Passing Interface in Modern C++. In: IEEE/ACM CCGrid. pp. 141–153 (2023).https://doi.org/10. 1109/CCGrid57682.2023.00023

arXiv 2023

[6] [6]

Proceedings of the JuliaCon Conferences1(1), 68 (2021).https://doi

Byrne, S., Wilcox, L.C., Churavy, V.: Mpi.jl: Julia bindings for the message passing interface. Proceedings of the JuliaCon Conferences1(1), 68 (2021).https://doi. org/10.21105/jcon.00068,https://doi.org/10.21105/jcon.00068

work page doi:10.21105/jcon.00068 2021

[7] [7]

Standard Proposal P2996R13, ISO/IEC JTC1/SC22/WG21 (2025), https://www.open-std.org/jtc1/sc22/wg21/docs/ papers/2025/p2996r13.html

Childers, W., Dimov, P., Katz, D., Revzin, B., Sutton, A., Vali, F., Vande- voorde, D.: Reflection for C++26. Standard Proposal P2996R13, ISO/IEC JTC1/SC22/WG21 (2025), https://www.open-std.org/jtc1/sc22/wg21/docs/ papers/2025/p2996r13.html

2025

[8] [8]

Correa, A.A.: B-MPI3 (2018),https://github.com/LLNL/b-mpi3

2018

[9] [9]

CoRRabs/2306.11840(2023)

Demiralp, A.C., Martin, P., Sakic, N., Krüger, M., Gerrits, T.: A C++20 interface for MPI 4.0. CoRRabs/2306.11840(2023)

arXiv 2023

[10] [10]

Journal of Par- allel and Distributed Computing pp

Edwards, H.C., Trott, C.R., Sunderland, D.: Kokkos: Enabling manycore perfor- mance portability through polymorphic memory access patterns. Journal of Par- allel and Distributed Computing pp. 3202–3216 (2014).https://doi.org/https: //doi.org/10.1016/j.jpdc.2014.07.003, domain-Specific Languages and High- Level Frameworks for High-Performance Computing Con...

work page doi:10.1016/j.jpdc.2014.07.003 2014

[11] [11]

In: 2021 Workshop on Exascale MPI (ExaMPI)

Ghosh, S., Alsobrooks, C., Rüfenacht, M., Skjellum, A., Bangalore, P.V., Lumsdaine, A.: Towards modern C++ language support for MPI. In: 2021 Workshop on Exascale MPI (ExaMPI). pp. 27–35. IEEE (2021)

2021

[12] [12]

Gregor, D., Troyer, M.: Boost.MPI (2005–2007),https://www.boost.org/doc/ libs/1_84_0/doc/html/mpi.html, version 1.84

2005

[13] [13]

Message Passing Interface Forum: MPI: A Message-Passing Interface Standard Ver- sion 5.0 (Jun 2025),https://www.mpi-forum.org/docs/mpi-5.0/mpi50-report. pdf

2025

[14] [14]

Standard Proposal P0896R4, ISO/IEC JTC1/SC22/WG21 (Nov 2018),https://www.open-std.org/ jtc1/sc22/wg21/docs/papers/2018/p0896r4.pdf

Niebler, E., Carter, C., Di Bella, C.: The one ranges proposal. Standard Proposal P0896R4, ISO/IEC JTC1/SC22/WG21 (Nov 2018),https://www.open-std.org/ jtc1/sc22/wg21/docs/papers/2018/p0896r4.pdf

2018

[15] [15]

com/NVIDIA/cccl, part of the CUDA Core Compute Libraries (CCCL)

NVIDIA: Thrust: The C++ parallel algorithms library (2025),https://github. com/NVIDIA/cccl, part of the CUDA Core Compute Libraries (CCCL)

2025

[16] [16]

In: Stotzka, R., Schiffers, M., Cotronis, Y

Pellegrini, S., Prodan, R., Fahringer, T.: A lightweight C++ interface to MPI. In: Stotzka, R., Schiffers, M., Cotronis, Y. (eds.) Proc. of the 20th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP). pp. 3–10. IEEE (2012).https://doi.org/10.1109/PDP.2012.42

work page doi:10.1109/pdp.2012.42 2012

[17] [17]

Polukhin, A.: Boost.pfr (2016), https://www.boost.org/doc/libs/1_84_0/doc/ html/boost_pfr.html

2016

[18] [18]

Concurrency and Computation: Practice and Experience13(4), 245–292 (2001).https://doi.org/https://doi.org/10.1002/cpe.556, https:// onlinelibrary.wiley.com/doi/abs/10.1002/cpe.556

Skjellum, A., Wooley, D.G., Lu, Z., Wolf, M., Bangalore, P.V., Lumsdaine, A., Squyres, J.M., McCandless, B.: Object-oriented analysis and design of the message passing interface. Concurrency and Computation: Practice and Experience13(4), 245–292 (2001).https://doi.org/https://doi.org/10.1002/cpe.556, https:// onlinelibrary.wiley.com/doi/abs/10.1002/cpe.556

work page doi:10.1002/cpe.556 2001

[19] [19]

Steinbusch, B., Gaspar, A., Brown, J.: rsmpi - MPI bindings for rust (2015),https: //github.com/rsmpi/rsmpi

2015

[20] [20]

Addison-Wesley (1994)

Stroustrup, B.: The design and evolution of C++. Addison-Wesley (1994)

1994

[21] [21]

github.io/CppCoreGuidelines/CppCoreGuidelines.html

Stroustrup, B., Sutter, H., et al.: C++ core guidelines (2024),https://isocpp. github.io/CppCoreGuidelines/CppCoreGuidelines.html

2024

[22] [22]

Specification Revision 11, The Khronos Group Inc

The Khronos SYCL Working Group: SYCL 2020 specification. Specification Revision 11, The Khronos Group Inc. (2025),https://registry.khronos.org/SYCL/specs/ sycl-2020/html/sycl-2020.html

2020

[23] [23]

Hilfer fractional advection-diffusion equations with power-law initial condition; a Numerical study using variational iteration method

Uhl, T.N., Schimek, M., Hübner, L., Hespe, D., Kurpicz, F., Seemaier, D., Stelz, C., Sanders, P.: KaMPIng: Flexible and (near) zero-overhead C++ bindings for MPI. In: Intl. Conf. for High Performance Computing, Networking, Storage, and Analysis (SC). IEEE (2024).https://doi.org/10.1109/SC41406.2024.00050

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1109/sc41406.2024.00050 2024