Concepts in Practice: C++ MPI Bindings for the HPC Ecosystem. From a Standardizable Core to a Composable Interface
Pith reviewed 2026-06-27 15:03 UTC · model grok-4.3
The pith
A core layer of refined C++20 concepts delivers a low-level native C++ MPI interface that works directly with STL containers and extends to GPU libraries via adapters.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper presents the first concrete realization of design principles for modern C++ MPI bindings in a layered architecture. At the foundation is a core layer of refined C++20 concepts that formalize the MPI standard's notion of data buffers, enable automatic mapping of standard C++ constructs, supply non-intrusive customization points for third-party types, and supply concept-based wrappers for MPI procedures. The result is a low-level native C++ MPI interface that works directly with STL containers, is highly extensible, and lends itself to standardization. Built on this core is KaMPIng-v2, a library offering convenience and memory-safety with composable pipe-based syntax. The core also s
What carries the argument
The core layer of refined C++20 concepts that formalize MPI's notion of data buffers, map C++ constructs automatically, and supply non-intrusive customization points together with concept-based wrappers for MPI procedures.
If this is right
- The core layer produces a low-level native C++ MPI interface that accepts STL containers directly without additional boilerplate.
- KaMPIng-v2 supplies memory-safe MPI programming with composable pipe-based syntax inspired by C++ ranges.
- Lightweight adapters integrate Kokkos views, Thrust device vectors, and SYCL buffers as first-class participants in MPI calls.
- The design remains self-contained for third-party libraries and supports potential standardization through its use of standard C++ mechanisms.
- The architecture demonstrates practical viability through a fully functional open-source reference implementation.
Where Pith is reading between the lines
- The non-intrusive customization points could allow existing C++ codebases to adopt the bindings without modifying their own type definitions.
- Similar concept-based layering might apply to bindings for other distributed communication standards beyond MPI.
- The separation of core and adapters could simplify maintenance when new performance-portability libraries emerge.
- Direct integration of GPU containers into MPI could reduce data movement overhead in heterogeneous HPC applications.
Load-bearing premise
Refined C++20 concepts can formalize MPI's notion of data buffers and provide non-intrusive customization points while preserving performance and compatibility with existing MPI implementations.
What would settle it
A test case in which the concept-based wrappers fail to compile or execute correctly with a standard STL container such as std::vector, or where the adapters for SYCL buffers introduce runtime incompatibility with an existing MPI implementation, would falsify the core claim.
Figures
read the original abstract
The official C++ MPI bindings were removed from the standard in 2008, leaving a gap that numerous third-party libraries have attempted to fill. However, existing wrappers typically cover only a limited subset of MPI or target specific use cases, falling short of a general-purpose solution. A recent conceptual paper proposed general design principles for modern C++ bindings based on C++20 concepts, without committing to a concrete interface. We present the first concrete realization of these principles in a layered architecture. At the foundation, we define a core layer: refined C++20 concepts formalizing the MPI standard's notion of data buffers, automatic mapping of standard C++ constructs, non-intrusive customization points for third-party types, and concept-based wrappers for MPI procedures. The result is a low-level native C++ MPI interface that works directly with STL containers, is highly extensible, and lends itself to standardization. Built on this core, we present KaMPIng-v2 -- a C++ MPI library offering the convenience and memory-safety of KaMPIng with composable, pipe-based syntax inspired by C++ ranges for efficient, boilerplate-free MPI programming. Finally, we demonstrate the core layer's broad applicability by designing lightweight adapters for GPU and performance-portability libraries, making the HPC ecosystem a first-class citizen in MPI. Kokkos views, Thrust device vectors, and SYCL buffers can be passed directly to MPI procedures, with adapter logic remaining self-contained. All contributions are backed by a fully functional open-source reference implementation, demonstrating the practical viability of the proposed design.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents the first concrete realization of previously proposed design principles for modern C++ MPI bindings. It introduces a layered architecture whose core layer uses refined C++20 concepts to formalize MPI data buffers, support automatic mapping of STL constructs, and provide non-intrusive customization points; this core underpins KaMPIng-v2 (with pipe-based, ranges-inspired syntax) and lightweight adapters that allow direct use of Kokkos views, Thrust device vectors, and SYCL buffers with MPI procedures. All elements are backed by a fully functional open-source reference implementation.
Significance. If the design and implementation hold, the work supplies a practical, extensible, and potentially standardizable low-level C++ MPI interface that directly addresses the gap left by the 2008 removal of official bindings while integrating the HPC ecosystem (including GPU and performance-portability libraries) as first-class citizens. Explicit strengths include the open-source reference implementation, the working adapters for Kokkos/Thrust/SYCL, and the demonstration that C++20 concepts can serve as non-intrusive customization points without altering existing MPI implementations.
major comments (1)
- Abstract: the claim that the core layer 'preserves performance and compatibility' while using refined C++20 concepts for buffer handling is load-bearing for the stated practical viability, yet the manuscript supplies no benchmarks, timing data, or comparisons against existing wrappers or raw MPI; this prevents assessment of whether the concept-based dispatch and adapters incur measurable overhead.
minor comments (2)
- The manuscript would benefit from an explicit related-work section (or expanded discussion in the introduction) that systematically contrasts the proposed core against the 'numerous third-party libraries' mentioned, citing their coverage limitations.
- Notation for the concept definitions and customization points should be clarified with a small table or diagram early in the core-layer description to aid readers unfamiliar with C++20 concepts.
Simulated Author's Rebuttal
We thank the referee for the constructive review and positive assessment of the work's significance. We address the major comment below.
read point-by-point responses
-
Referee: [—] Abstract: the claim that the core layer 'preserves performance and compatibility' while using refined C++20 concepts for buffer handling is load-bearing for the stated practical viability, yet the manuscript supplies no benchmarks, timing data, or comparisons against existing wrappers or raw MPI; this prevents assessment of whether the concept-based dispatch and adapters incur measurable overhead.
Authors: We agree that the absence of explicit benchmarks leaves the performance claim unquantified. The core layer is intentionally designed around C++20 concepts to enable zero-overhead static dispatch (resolved entirely at compile time) and thin, non-intrusive adapters that forward directly to the underlying MPI calls without additional copies or runtime indirection. This mirrors the zero-cost abstraction principle used in the STL and ranges library. Nevertheless, to allow readers to verify the claim, we will add a dedicated evaluation section containing microbenchmarks that compare the core layer and the Kokkos/Thrust/SYCL adapters against raw MPI and selected third-party wrappers. The revised manuscript will therefore include timing data and overhead measurements. revision: yes
Circularity Check
No significant circularity
full rationale
The paper presents a concrete layered C++ design and reference implementation for MPI bindings, building on a prior conceptual paper's principles without committing to an interface. No equations, fitted parameters, or predictions are involved; the work supplies working code, adapters for Kokkos/Thrust/SYCL, and non-intrusive customization points. The derivation chain consists of design choices instantiated directly in the open-source implementation rather than reducing to self-citation, self-definition, or renamed inputs. The cited conceptual paper is treated as external motivation, not a load-bearing uniqueness theorem or ansatz smuggled in. This is a standard non-circular design/implementation paper.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption C++20 concepts can be refined to formalize MPI data buffers and procedures while remaining compatible with the MPI standard
Reference graph
Works this paper leans on
-
[1]
Avans, C.N., Ciesko, J., Pearson, C., Suggs, E.D., Olivier, S.L., Skjellum, A.: Performance insights into supporting Kokkos views in the Kokkos Comm MPI library. In: IEEE CLUSTER Workshops. pp. 186–187 (2024). https: //doi.org/10.1109/CLUSTERWorkshops61563.2024.00051, https://github.com/ kokkos/kokkos-comm
-
[2]
Avans, C.N., Correa, A.A., Ghosh, S., Schimek, M., Schuchart, J., Skjellum, A., Suggs, E.D., Uhl, T.N.: Concepts for designing modern C++ interfaces for MPI. In: EuroMPI. pp. 165–183. Lecture Notes in Computer Science, Springer (2025). https://doi.org/10.1007/978-3-032-07194-1_10
-
[3]
Bauke, H.: MPL - a message passing library (2015),https://github.com/rabauke/ mpl
2015
-
[4]
In: GPU computing gems Jade edition, pp
Bell, N., Hoberock, J.: Thrust: A productivity-oriented library for cuda. In: GPU computing gems Jade edition, pp. 359–371. Elsevier (2012)
2012
-
[5]
Beni, M.S., Crisci, L., Cosenza, B.: EMPI: Enhanced Message Passing Interface in Modern C++. In: IEEE/ACM CCGrid. pp. 141–153 (2023).https://doi.org/10. 1109/CCGrid57682.2023.00023
arXiv 2023
-
[6]
Proceedings of the JuliaCon Conferences1(1), 68 (2021).https://doi
Byrne, S., Wilcox, L.C., Churavy, V.: Mpi.jl: Julia bindings for the message passing interface. Proceedings of the JuliaCon Conferences1(1), 68 (2021).https://doi. org/10.21105/jcon.00068,https://doi.org/10.21105/jcon.00068
-
[7]
Standard Proposal P2996R13, ISO/IEC JTC1/SC22/WG21 (2025), https://www.open-std.org/jtc1/sc22/wg21/docs/ papers/2025/p2996r13.html
Childers, W., Dimov, P., Katz, D., Revzin, B., Sutton, A., Vali, F., Vande- voorde, D.: Reflection for C++26. Standard Proposal P2996R13, ISO/IEC JTC1/SC22/WG21 (2025), https://www.open-std.org/jtc1/sc22/wg21/docs/ papers/2025/p2996r13.html
2025
-
[8]
Correa, A.A.: B-MPI3 (2018),https://github.com/LLNL/b-mpi3
2018
-
[9]
Demiralp, A.C., Martin, P., Sakic, N., Krüger, M., Gerrits, T.: A C++20 interface for MPI 4.0. CoRRabs/2306.11840(2023)
arXiv 2023
-
[10]
Journal of Par- allel and Distributed Computing pp
Edwards, H.C., Trott, C.R., Sunderland, D.: Kokkos: Enabling manycore perfor- mance portability through polymorphic memory access patterns. Journal of Par- allel and Distributed Computing pp. 3202–3216 (2014).https://doi.org/https: //doi.org/10.1016/j.jpdc.2014.07.003, domain-Specific Languages and High- Level Frameworks for High-Performance Computing Con...
-
[11]
In: 2021 Workshop on Exascale MPI (ExaMPI)
Ghosh, S., Alsobrooks, C., Rüfenacht, M., Skjellum, A., Bangalore, P.V., Lumsdaine, A.: Towards modern C++ language support for MPI. In: 2021 Workshop on Exascale MPI (ExaMPI). pp. 27–35. IEEE (2021)
2021
-
[12]
Gregor, D., Troyer, M.: Boost.MPI (2005–2007),https://www.boost.org/doc/ libs/1_84_0/doc/html/mpi.html, version 1.84
2005
-
[13]
Message Passing Interface Forum: MPI: A Message-Passing Interface Standard Ver- sion 5.0 (Jun 2025),https://www.mpi-forum.org/docs/mpi-5.0/mpi50-report. pdf
2025
-
[14]
Standard Proposal P0896R4, ISO/IEC JTC1/SC22/WG21 (Nov 2018),https://www.open-std.org/ jtc1/sc22/wg21/docs/papers/2018/p0896r4.pdf
Niebler, E., Carter, C., Di Bella, C.: The one ranges proposal. Standard Proposal P0896R4, ISO/IEC JTC1/SC22/WG21 (Nov 2018),https://www.open-std.org/ jtc1/sc22/wg21/docs/papers/2018/p0896r4.pdf
2018
-
[15]
com/NVIDIA/cccl, part of the CUDA Core Compute Libraries (CCCL)
NVIDIA: Thrust: The C++ parallel algorithms library (2025),https://github. com/NVIDIA/cccl, part of the CUDA Core Compute Libraries (CCCL)
2025
-
[16]
In: Stotzka, R., Schiffers, M., Cotronis, Y
Pellegrini, S., Prodan, R., Fahringer, T.: A lightweight C++ interface to MPI. In: Stotzka, R., Schiffers, M., Cotronis, Y. (eds.) Proc. of the 20th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP). pp. 3–10. IEEE (2012).https://doi.org/10.1109/PDP.2012.42
-
[17]
Polukhin, A.: Boost.pfr (2016), https://www.boost.org/doc/libs/1_84_0/doc/ html/boost_pfr.html
2016
-
[18]
Skjellum, A., Wooley, D.G., Lu, Z., Wolf, M., Bangalore, P.V., Lumsdaine, A., Squyres, J.M., McCandless, B.: Object-oriented analysis and design of the message passing interface. Concurrency and Computation: Practice and Experience13(4), 245–292 (2001).https://doi.org/https://doi.org/10.1002/cpe.556, https:// onlinelibrary.wiley.com/doi/abs/10.1002/cpe.556
-
[19]
Steinbusch, B., Gaspar, A., Brown, J.: rsmpi - MPI bindings for rust (2015),https: //github.com/rsmpi/rsmpi
2015
-
[20]
Addison-Wesley (1994)
Stroustrup, B.: The design and evolution of C++. Addison-Wesley (1994)
1994
-
[21]
github.io/CppCoreGuidelines/CppCoreGuidelines.html
Stroustrup, B., Sutter, H., et al.: C++ core guidelines (2024),https://isocpp. github.io/CppCoreGuidelines/CppCoreGuidelines.html
2024
-
[22]
Specification Revision 11, The Khronos Group Inc
The Khronos SYCL Working Group: SYCL 2020 specification. Specification Revision 11, The Khronos Group Inc. (2025),https://registry.khronos.org/SYCL/specs/ sycl-2020/html/sycl-2020.html
2020
-
[23]
Uhl, T.N., Schimek, M., Hübner, L., Hespe, D., Kurpicz, F., Seemaier, D., Stelz, C., Sanders, P.: KaMPIng: Flexible and (near) zero-overhead C++ bindings for MPI. In: Intl. Conf. for High Performance Computing, Networking, Storage, and Analysis (SC). IEEE (2024).https://doi.org/10.1109/SC41406.2024.00050
work page internal anchor Pith review Pith/arXiv arXiv doi:10.1109/sc41406.2024.00050 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.