Dynamic Resource Management in Production HPC Clusters

Antonio J. Pe\~na; Guillaume Houzeaux; Petter Sand{\aa}s; Sergio Iserte

arxiv: 2606.13266 · v1 · pith:UYW65QHWnew · submitted 2026-06-11 · 💻 cs.DC

Dynamic Resource Management in Production HPC Clusters

Petter Sand{\aa}s , Sergio Iserte , Guillaume Houzeaux , Antonio J. Pe\~na This is my paper

Pith reviewed 2026-06-27 05:51 UTC · model grok-4.3

classification 💻 cs.DC

keywords MPI malleabilitydynamic resource managementHPC clustersproduction systemsnode-hour consumptionDMR frameworknon-invasive integration

0 comments

The pith

A malleability framework lets HPC applications resize MPI jobs at runtime on production clusters without code changes or scheduler tweaks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces the Dynamic Management of Resources (DMR) framework to bring dynamic resource allocation to real-world high-performance computing. It shows how to integrate malleability into existing MPI applications so they can grow or shrink their process count during execution. Tests on two large scientific codes across three TOP500 machines under normal production rules produced runtimes close to fixed-size baselines. At the same time the approach cut total node-hours needed to complete the same workloads. A reader would care because this removes the main obstacles that have kept dynamic techniques out of everyday HPC operations.

Core claim

The DMR framework supplies a non-invasive MPI malleability methodology that works with current HPC software stacks and vanilla resource managers. When the API is added to two production-scale scientific applications, the resulting jobs run on three TOP500 systems with performance comparable to static allocations while consuming substantially fewer node-hours for identical workloads.

What carries the argument

The Dynamic Management of Resources (DMR) framework and its API, which supplies the mechanism for runtime MPI process resizing without intrusive application changes or scheduler modifications.

If this is right

Malleability becomes usable on production systems that employ standard resource managers.
Node-hour consumption drops for workloads whose demand varies over time.
The barrier to adopting dynamic resource management in HPC is lowered.
Existing applications can exploit malleability without scheduler modifications.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Applications with phases of high and low parallelism could automatically match resources to instantaneous need.
Similar lightweight APIs might be developed for other parallel programming models beyond MPI.
Production schedulers could eventually incorporate malleability signals to improve overall cluster utilization.
Energy and power budgets on large systems could be managed more precisely by shrinking allocations during low-demand periods.

Load-bearing premise

The DMR API can be added to existing large-scale scientific applications without requiring major code changes.

What would settle it

An integration attempt on a third production application that demands extensive code rewrites or produces noticeably slower runtimes than the static baseline under the same production configuration.

Figures

Figures reproduced from arXiv: 2606.13266 by Antonio J. Pe\~na, Guillaume Houzeaux, Petter Sand{\aa}s, Sergio Iserte.

**Figure 2.** Figure 2: DMRv2 design and working modes. As a result, once Slurm4DMR had been launched under a fixed reservation, resource requests could be satisfied almost instantly. In production supercomputers, however, resources are heavily contended; job submissions may wait in queues for non-trivial and non-deterministic periods of time, and a global view of cluster status is often unavailable to users (e.g., on the MareNos… view at source ↗

**Figure 3.** Figure 3: CE (top) and node count (bottom) per time step of Alya malleable executions on Leonardo. Left plots correspond to the [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: MPDATA malleable on MN5 ACC with ROUND_POLICY indicating the number of time steps needed for each reconfiguration until resources are available. The gray dashed line shows a Slurm4DMR-based execution and the solid blue line shows a DMR@Jobs-based execution. D. Marenostrum 5 GPP – TOP500 #45 Similarly to the previous experiment in Section V-B, this experiment evaluates how DMRv2’s CE-based resizing behaves… view at source ↗

**Figure 5.** Figure 5: CE (top) and node count (bottom) per time step of Alya malleable executions on MN5 GPP. Left plots correspond to the [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: Node allocation over time for a 50-job Alya workload on MN5 GPP [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 7.** Figure 7: Example of the timeline of a job’s stages. [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗

read the original abstract

Many large-scale scientific applications exhibit time-varying behavior, yet production HPC clusters still rely on rigid, fixed-size allocations, and most dynamic techniques remain confined to laboratory prototypes. This work presents a practical MPI malleability methodology that integrates with state-of-the-art high-performance computing (HPC) software stacks and operational practices. The methodology is implemented in the Dynamic Management of Resources (DMR) framework and is designed to ease adoption by existing applications without requiring intrusive code changes or scheduler modifications. We evaluate our approach by integrating the DMR API into two large-scale scientific applications and deploying them on three TOP500 supercomputers under realistic production configurations. Our non-invasive malleability solution achieves performance comparable to static baselines in controlled environments while substantially reducing node-hour consumption for identical workloads. These results show that malleability can be effectively exploited on production systems using vanilla resource managers, lowering the barrier to adoption of dynamic resource management in HPC.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DMR gives a concrete, scheduler-agnostic way to add malleability to real MPI codes on production systems, but the non-invasiveness and performance claims rest on high-level assertions without visible numbers or integration details.

read the letter

The main takeaway is that this paper introduces the DMR framework and API as a practical implementation for dynamic resource management that works with existing HPC stacks and does not require scheduler modifications.

It does a solid job moving malleability out of prototypes by integrating the approach into two large scientific applications and running them on three TOP500 machines under realistic production settings. The reported outcome—performance close to static baselines with lower node-hour use—is the kind of engineering result that could matter for adoption.

The soft spots are around evidence. The abstract states the non-invasive claim and the performance gains but supplies no quantitative data, error bars, code change sizes, or method details. The stress-test note correctly flags that without concrete proof the integrations stayed truly non-invasive under production constraints, the headline result is hard to evaluate. If the full paper has those specifics and reproducible numbers, the contribution strengthens; otherwise the central assumption stays under-supported.

This is for people in the HPC systems and resource management area who want deployable techniques rather than new theory. A reader focused on practical malleability would get value from the integration approach.

It deserves peer review because the real-system deployment angle is worth referee time even if the evaluation needs expansion.

Referee Report

2 major / 1 minor

Summary. The paper presents the Dynamic Management of Resources (DMR) framework implementing a practical MPI malleability methodology for production HPC clusters. It integrates with existing software stacks and schedulers without requiring intrusive application changes or scheduler modifications, evaluates the approach by integrating the DMR API into two large-scale scientific applications deployed on three TOP500 systems under realistic production configurations, and claims that the non-invasive solution achieves performance comparable to static baselines while substantially reducing node-hour consumption for identical workloads.

Significance. If the non-invasive integration claims and quantitative performance/resource results hold, the work would be significant for bridging the gap between laboratory prototypes and production HPC, as it demonstrates malleability using vanilla resource managers and could reduce barriers to adoption for time-varying scientific applications.

major comments (2)

[Abstract] Abstract: the central claims of 'performance comparable to static baselines' and 'substantially reducing node-hour consumption' are asserted without any quantitative metrics, error bars, specific numbers, or details on evaluation methods, which is load-bearing because the headline result cannot be assessed for magnitude or statistical validity from the provided information.
[Evaluation] Evaluation (deployment on two applications and three TOP500 systems): the non-invasiveness of DMR API integration is asserted as requiring no intrusive code changes under realistic production configurations, yet no concrete evidence (e.g., lines of code modified, specific API additions, or scheduler interaction details) is supplied; this directly undermines the workload-equivalence precondition for the performance and node-hour claims.

minor comments (1)

The term 'non-invasive' and 'intrusive code changes' should be defined more precisely (e.g., with quantitative thresholds) to allow readers to judge integration effort consistently.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments highlight opportunities to strengthen the abstract and provide more explicit evidence for our integration claims. We address each point below and will incorporate revisions to improve clarity and substantiation.

read point-by-point responses

Referee: [Abstract] Abstract: the central claims of 'performance comparable to static baselines' and 'substantially reducing node-hour consumption' are asserted without any quantitative metrics, error bars, specific numbers, or details on evaluation methods, which is load-bearing because the headline result cannot be assessed for magnitude or statistical validity from the provided information.

Authors: We agree that the abstract would benefit from including representative quantitative results to make the central claims more self-contained. While the Evaluation section of the manuscript reports the detailed performance metrics (including comparisons to static baselines with variability measures across runs) and node-hour reductions on the three TOP500 systems, we will revise the abstract to incorporate key highlights from those results along with a concise reference to the evaluation methodology. revision: yes
Referee: [Evaluation] Evaluation (deployment on two applications and three TOP500 systems): the non-invasiveness of DMR API integration is asserted as requiring no intrusive code changes under realistic production configurations, yet no concrete evidence (e.g., lines of code modified, specific API additions, or scheduler interaction details) is supplied; this directly undermines the workload-equivalence precondition for the performance and node-hour claims.

Authors: The manuscript describes the DMR framework as designed for non-intrusive integration with existing applications and vanilla schedulers. To directly address the request for concrete evidence, we will revise the Evaluation section to add explicit details on the integration process, including the extent of code modifications for each application, the specific API elements introduced, and confirmation that no scheduler changes were required. This will better document the workload-equivalence basis for the reported results. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical claims with no derivations or self-referential reductions

full rationale

The paper describes an implemented DMR framework for MPI malleability, its integration into two applications, and deployment results on three TOP500 systems. All load-bearing claims are empirical performance and resource-consumption measurements under production configurations. No equations, first-principles derivations, fitted parameters renamed as predictions, or self-citation chains appear in the provided text. The non-invasiveness assertion is presented as an observed outcome of the integration effort rather than a definitional or fitted result. This matches the default expectation for an empirical systems paper whose central results rest on external machine measurements rather than internal construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities identified from the abstract; the work relies on standard MPI and resource manager interfaces.

pith-pipeline@v0.9.1-grok · 5690 in / 935 out tokens · 17843 ms · 2026-06-27T05:51:03.800793+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

35 extracted references · 31 canonical work pages

[1]

Exascale workload characterization and architecture implications,

P. Balaprakash, D. Buntinas, A. Chan, A. Guha, R. Gupta, S. H. K. Narayanan, A. A. Chien, P. Hovland, and B. Norris, “Exascale workload characterization and architecture implications,” in2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), 2013, pp. 120–121. [Online]. Available: https://doi.org/10.1109/ISPASS.2013.6557153

work page doi:10.1109/ispass.2013.6557153 2013
[2]

Invasive computing: An overview,

J. Teich, J. Henkel, A. Herkersdorf, D. Schmitt-Landsiedel, W. Schr ¨oder- Preikschat, and G. Snelting, “Invasive computing: An overview,” Multiprocessor System-on-Chip: Hardware Design and Tool Integration, pp. 241–268, 2011. [Online]. Available: https://doi.org/10.1007/978-1 -4419-6460-1 11

work page doi:10.1007/978-1 2011
[3]

Packing schemes for gang scheduling,

D. G. Feitelson, “Packing schemes for gang scheduling,” inJob Scheduling Strategies for Parallel Processing, 1996, pp. 89–110. [Online]. Available: https://doi.org/10.1007/BFb0022289

work page doi:10.1007/bfb0022289 1996
[4]

A Survey on Malleability Solutions for High-Performance Distributed Computing,

J. I. Aliaga, M. Castillo, S. Iserte, I. Mart ´ın-´Alvarez, and R. Mayo, “A Survey on Malleability Solutions for High-Performance Distributed Computing,”Applied Science, vol. 12, pp. 1–32, May 2022. [Online]. Available: https://doi.org/10.3390/app12105231

work page doi:10.3390/app12105231 2022
[5]

Malleability in modern HPC systems: Current experiences, challenges, and future opportunities,

A. Tarraf, M. Schreiber, A. Cascajo, J.-B. Besnard, M.-A. Vef, D. Huber, S. Happ, A. Brinkmann, D. E. Singh, H.-C. Hoppe, A. Miranda, A. J. Pe ˜na, R. Machado, M. G. Gasulla, M. Schulz, P. Carpenter, S. Pickartz, T. Rotaru, S. Iserte, V . Lopez, J. Ejarque, H. Sirwani, and F. Wolf, “Malleability in modern HPC systems: Current experiences, challenges, and ...

work page doi:10.1109/tpds.2024.3406764 2024
[6]

Slurm: Simple linux utility for resource management,

A. B. Yoo, M. A. Jette, and M. Grondona, “Slurm: Simple linux utility for resource management,” inJob Scheduling Strategies for Parallel Processing, 2003, pp. 44–60. [Online]. Available: https://doi.org/10.1007/10968987 3

work page doi:10.1007/10968987 2003
[7]

High-throughput computation through efficient resource management,

S. Iserte, “High-throughput computation through efficient resource management,” Ph.D. Thesis, Universitat Jaume I (UJI), Spain, Nov
[8]

Available: http://dx.doi.org/10.6035/14101.2018.176272

[Online]. Available: http://dx.doi.org/10.6035/14101.2018.176272

work page doi:10.6035/14101.2018.176272 2018
[9]

Architecting malleable mpi applications for priority-driven adaptive scheduling,

P. Lemarinier, K. Hasanov, S. Venugopal, and K. Katrinis, “Architecting malleable mpi applications for priority-driven adaptive scheduling,” inProceedings of the 23rd European MPI Users’ Group Meeting, ser. EuroMPI ’16, New York, NY , USA, 2016, p. 74–81. [Online]. Available: https://doi.org/10.1145/2966884.2966907

work page doi:10.1145/2966884.2966907 2016
[10]

Autonomic malleability in iterative mpi applications,

A. C. Sena, F. S. Ribeiro, V . E. Rebello, A. P. Nascimento, and C. Boeres, “Autonomic malleability in iterative mpi applications,” in 2013 25th International Symposium on Computer Architecture and High Performance Computing, 2013, pp. 192–199. [Online]. Available: https://doi.org/10.1109/SBAC-PAD.2013.4

work page doi:10.1109/sbac-pad.2013.4 2013
[11]

Reshape: A framework for dynamic resizing and scheduling of homogeneous applications in a parallel environment,

R. Sudarsan and C. J. Ribbens, “Reshape: A framework for dynamic resizing and scheduling of homogeneous applications in a parallel environment,” in2007 International Conference on Parallel Processing (ICPP 2007), 2007, pp. 44–44. [Online]. Available: https://doi.org/10.1109/ICPP.2007.73

work page doi:10.1109/icpp.2007.73 2007
[12]

Dynamic malleability in iterative mpi applications,

K. E. Maghraoui, T. Desell, B. K. Szyma ´nski, and C. A. Varela, “Dynamic malleability in iterative mpi applications,”Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid ’07), pp. 591–598, 2007. [Online]. Available: https: //api.semanticscholar.org/CorpusID:2784887

2007
[13]

Towards realizing the potential of malleable jobs,

A. Gupta, B. Acun, O. Sarood, and L. V . Kal ´e, “Towards realizing the potential of malleable jobs,” in2014 21st International Conference on High Performance Computing (HiPC), 2014, pp. 1–10. [Online]. Available: https://doi.org/10.1109/HiPC.2014.7116905

work page doi:10.1109/hipc.2014.7116905 2014
[14]

Flex-mpi: an mpi extension for supporting dynamic load balancing on heterogeneous non-dedicated systems,

G. Mart ´ın, M.-C. Marinescu, D. E. Singh, and J. Carretero, “Flex-mpi: an mpi extension for supporting dynamic load balancing on heterogeneous non-dedicated systems,” inProceedings of the 19th International Conference on Parallel Processing, ser. Euro-Par’13, 2013, p. 138–149. [Online]. Available: https://doi.org/10.1007/978-3-642-40047-6 16

work page doi:10.1007/978-3-642-40047-6 2013
[15]

Infrastructure and api extensions for elastic execution of mpi applications,

I. Compr ´es, A. Mo-Hellenbrand, M. Gerndt, and H.-J. Bungartz, “Infrastructure and api extensions for elastic execution of mpi applications,” inProceedings of the 23rd European MPI Users’ Group Meeting, ser. EuroMPI ’16, 2016, p. 82–97. [Online]. Available: https://doi.org/10.1145/2966884.2966917

work page doi:10.1145/2966884.2966917 2016
[16]

Extending SLURM for dynamic resource-aware adaptive batch scheduling,

M. Chadha, J. John, and M. Gerndt, “Extending SLURM for dynamic resource-aware adaptive batch scheduling,” inIEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC), 2020, pp. 223–232. [Online]. Available: https://doi.org/10.1109/HiPC50609.2020.00036

work page doi:10.1109/hipc50609.2020.00036 2020
[17]

Dynamic resource management for elastic scientific workflows using PMIx,

R. Bhattarai, H. Pritchard, and S. Ghafoor, “Dynamic resource management for elastic scientific workflows using PMIx,” inIEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2024. [Online]. Available: https://doi.ieeecomputersociety. org/10.1109/IPDPSW63119.2024.00131

work page doi:10.1109/ipdpsw63119.2024.00131 2024
[18]

Dynamic resource allocation for efficient parallel CFD simulations,

G. Houzeaux, R. M. Badia, R. Borrell, D. Dosimont, J. Ejarque, M. Garcia-Gasulla, and V . L ´opez, “Dynamic resource allocation for efficient parallel CFD simulations,”Computers & Fluids, vol. 245, p. 105577, Sep. 2022. [Online]. Available: https://doi.org/10.1016/j.comp fluid.2022.105577

work page doi:10.1016/j.comp 2022
[19]

Resource optimization with MPI process malleability for dynamic workloads in HPC clusters,

S. Iserte, I. Mart ´ın-´Alvarez, K. Rojek, J. I. Aliaga, M. Castillo, W. Folwarska, and A. J. Pe ˜na, “Resource optimization with MPI process malleability for dynamic workloads in HPC clusters,”Future Generation Computer Systems, p. 107949, 2025. [Online]. Available: https://doi.org/10.1016/j.future.2025.107949

work page doi:10.1016/j.future.2025.107949 2025
[20]

Dynamic spawning of mpi processes applied to malleability,

I. Mart ´ın-´Alvarez, J. I. Aliaga, M. Castillo, S. Iserte, and R. Mayo, “Dynamic spawning of mpi processes applied to malleability,” International Journal of High Performance Computing Applications, vol. 38, no. 2, p. 69–93, Mar. 2024. [Online]. Available: https: //doi.org/10.1177/10943420231176527

work page doi:10.1177/10943420231176527 2024
[21]

Design Principles of Dynamic Resource Management for High- Performance Parallel Programming Models,

D. Huber, M. Schreiber, M. Schulz, H. Pritchard, and D. Holmes, “Design Principles of Dynamic Resource Management for High- Performance Parallel Programming Models,” 2024. [Online]. Available: https://arxiv.org/abs/2403.17107

arXiv 2024
[22]

DMR API: Improving cluster productivity by turning applications into malleable,

S. Iserte, R. Mayo, E. S. Quintana-Ort ´ı, V . Beltran, and A. J. Pe ˜na, “DMR API: Improving cluster productivity by turning applications into malleable,”Parallel Computing, vol. 78, pp. 54–66, 2018. [Online]. Available: https://doi.org/10.1016/j.parco.2018.07.006

work page doi:10.1016/j.parco.2018.07.006 2018
[23]

Talp: A lightweight tool to unveil parallel efficiency of large-scale executions,

V . Lopez, G. Ramirez Miranda, and M. Garcia-Gasulla, “Talp: A lightweight tool to unveil parallel efficiency of large-scale executions,” inProceedings of the 2021 on Performance EngineeRing, Modelling, Analysis, and VisualizatiOn STrategy, ser. PERMA VOST ’21, 2021, p. 3–10. [Online]. Available: https://doi.org/10.1145/3452412.3462753

work page doi:10.1145/3452412.3462753 2021
[24]

MPI Malleability Validation under Replayed Real-World HPC Conditions,

S. Iserte, M. Madon, G. Da Costa, J.-M. Pierson, and A. J. Pe ˜na, “MPI Malleability Validation under Replayed Real-World HPC Conditions,” Future Generation Computer Systems, p. 108305, Dec. 2025. [Online]. Available: https://doi.org/10.1016/j.future.2025.108305

work page doi:10.1016/j.future.2025.108305 2025
[25]

Using an adaptive hpc runtime system to reconfigure the cache hierarchy,

E. Totoni, J. Torrellas, and L. V . Kale, “Using an adaptive hpc runtime system to reconfigure the cache hierarchy,” inSC ’14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2014, pp. 1047–1058. [Online]. Available: https://doi.org/10.1109/SC.2014.90

work page doi:10.1109/sc.2014.90 2014
[26]

Predicting output performance of a petascale supercomputer,

B. Xie, Y . Huang, J. S. Chase, J. Y . Choi, S. Klasky, J. Lofstead, and S. Oral, “Predicting output performance of a petascale supercomputer,” inProceedings of the 26th International Symposium on High- Performance Parallel and Distributed Computing, 2017, p. 181–192. [Online]. Available: https://doi.org/10.1145/3078597.3078614

work page doi:10.1145/3078597.3078614 2017
[27]

In: 2025 33rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)

J. Ekelund, S. Markidis, and I. Peng, “Boosting performance of iterative applications on gpus: Kernel batching with cuda graphs,” in33rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP), 2025, pp. 70–77. [Online]. Available: https://doi.org/10.1109/PDP66500.2025.00019

work page doi:10.1109/pdp66500.2025.00019 2025
[28]

Malleable computational fluid dynamics simulations,

S. Iserte, G. Houzeaux, P. Sand ˚as, A. J. Pe ˜na, and M. Garcia-Gasulla, “Malleable computational fluid dynamics simulations,” inProceedings of the 36th Parallel CFD International Conference, Merida, Yucatan, Mexico, Nov. 2025, [In-press]

2025
[29]

Alya: Multiphysics Engineering Simulation Towards Exascale,

M. V ´azquez, G. Houzeaux, S. Koric, A. Artigues, J. Aguado- Sierra, R. Ar ´ıs, D. Mira, H. Calmet, F. Cucchietti, H. Owen, A. Taha, E. D. Burness, J. M. Cela, and M. Valero, “Alya: Multiphysics Engineering Simulation Towards Exascale,”Journal of Computational Sciences, vol. 14, pp. 15–27, 2016. [Online]. Available: https://doi.org/10.1016/j.jocs.2015.12.007

work page doi:10.1016/j.jocs.2015.12.007 2016
[30]

A low-dissipation finite element scheme for scale resolving simulations of turbulent flows,

O. Lehmkuhl, G. Houzeaux, H. Owen, G. Chrysokentis, and I. Rodriguez, “A low-dissipation finite element scheme for scale resolving simulations of turbulent flows,”Journal of Computational Physics, vol. 390, pp. 51–65, 2019. [Online]. Available: https: //doi.org/10.1016/j.jcp.2019.04.004

work page doi:10.1016/j.jcp.2019.04.004 2019
[31]

Parallelization of 3d mpdata algorithm using many graphics processors,

K. Rojek and R. Wyrzykowski, “Parallelization of 3d mpdata algorithm using many graphics processors,” inParallel Computing Technologies, 2015, pp. 445–457. [Online]. Available: https://doi.org/10.1007/978-3 -319-21909-7 43

work page doi:10.1007/978-3 2015
[32]

A study of the effect of process malleability in the energy efficiency on GPU-based clusters,

S. Iserte and K. Rojek, “A study of the effect of process malleability in the energy efficiency on GPU-based clusters,”Journal of Supercomputing, vol. 76, pp. 255–274, Oct. 2020. [Online]. Available: https://doi.org/10.1007/s11227-019-03034-x

work page doi:10.1007/s11227-019-03034-x 2020
[33]

Dynamic resource management in HPC systems using dynamic processes with PSets,

D. Huber, S. Iserte, M. Schreiber, P.-F. Dutot, O. Richard, A. J. Pe ˜na, K. Gaddameedi, and T. Neckel, “Dynamic resource management in HPC systems using dynamic processes with PSets,” inProceedings of the 32nd IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC), Hyderabad, India, Dec. 2025. [Online]. Available: https:/...

work page doi:10.1109/hipc66333.2025.00036 2025
[35]

A Layered Approach for Dynamic Resource Management in HPC,

H.-J. Bungartz, P.-F. Dutot, J. Fecht, K. Gaddameedi, D. Huber, S. Iserte, M. Minion, T. Neckel, A. Pe ˜na, O. Richard, M. Schreiber, M. Schulz, and V . Sch ¨uller, “A Layered Approach for Dynamic Resource Management in HPC,” inEuro-Par 2024: Parallel Processing Workshops: Euro-Par 2024 International Workshops, Madrid, Spain, August 26–30, 2024, Proceedin...

work page doi:10.1109/icppw.2014.15 2024
[36]

Preparing mpich for exascale,

Y . Guo, K. Raffenetti, H. Zhou, P. Balaji, M. Si, A. Amer, S. Iwasaki, S. Seo, G. Congiu, R. Latham, L. Oden, T. Gillis, R. Zambre, K. Ouyang, C. Archer, W. Bland, J. Jose, S. Sur, H. Fujita, D. Durnov, M. Chuvelev, G. Zheng, A. Brooks, S. Thapaliya, T. Doodi, M. Garazan, S. Oyanagi, M. Snir, and R. Thakur, “Preparing mpich for exascale,”The Internationa...

work page doi:10.1177/10943420241311608 2025

[1] [1]

Exascale workload characterization and architecture implications,

P. Balaprakash, D. Buntinas, A. Chan, A. Guha, R. Gupta, S. H. K. Narayanan, A. A. Chien, P. Hovland, and B. Norris, “Exascale workload characterization and architecture implications,” in2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), 2013, pp. 120–121. [Online]. Available: https://doi.org/10.1109/ISPASS.2013.6557153

work page doi:10.1109/ispass.2013.6557153 2013

[2] [2]

Invasive computing: An overview,

J. Teich, J. Henkel, A. Herkersdorf, D. Schmitt-Landsiedel, W. Schr ¨oder- Preikschat, and G. Snelting, “Invasive computing: An overview,” Multiprocessor System-on-Chip: Hardware Design and Tool Integration, pp. 241–268, 2011. [Online]. Available: https://doi.org/10.1007/978-1 -4419-6460-1 11

work page doi:10.1007/978-1 2011

[3] [3]

Packing schemes for gang scheduling,

D. G. Feitelson, “Packing schemes for gang scheduling,” inJob Scheduling Strategies for Parallel Processing, 1996, pp. 89–110. [Online]. Available: https://doi.org/10.1007/BFb0022289

work page doi:10.1007/bfb0022289 1996

[4] [4]

A Survey on Malleability Solutions for High-Performance Distributed Computing,

J. I. Aliaga, M. Castillo, S. Iserte, I. Mart ´ın-´Alvarez, and R. Mayo, “A Survey on Malleability Solutions for High-Performance Distributed Computing,”Applied Science, vol. 12, pp. 1–32, May 2022. [Online]. Available: https://doi.org/10.3390/app12105231

work page doi:10.3390/app12105231 2022

[5] [5]

Malleability in modern HPC systems: Current experiences, challenges, and future opportunities,

A. Tarraf, M. Schreiber, A. Cascajo, J.-B. Besnard, M.-A. Vef, D. Huber, S. Happ, A. Brinkmann, D. E. Singh, H.-C. Hoppe, A. Miranda, A. J. Pe ˜na, R. Machado, M. G. Gasulla, M. Schulz, P. Carpenter, S. Pickartz, T. Rotaru, S. Iserte, V . Lopez, J. Ejarque, H. Sirwani, and F. Wolf, “Malleability in modern HPC systems: Current experiences, challenges, and ...

work page doi:10.1109/tpds.2024.3406764 2024

[6] [6]

Slurm: Simple linux utility for resource management,

A. B. Yoo, M. A. Jette, and M. Grondona, “Slurm: Simple linux utility for resource management,” inJob Scheduling Strategies for Parallel Processing, 2003, pp. 44–60. [Online]. Available: https://doi.org/10.1007/10968987 3

work page doi:10.1007/10968987 2003

[7] [7]

High-throughput computation through efficient resource management,

S. Iserte, “High-throughput computation through efficient resource management,” Ph.D. Thesis, Universitat Jaume I (UJI), Spain, Nov

[8] [8]

Available: http://dx.doi.org/10.6035/14101.2018.176272

[Online]. Available: http://dx.doi.org/10.6035/14101.2018.176272

work page doi:10.6035/14101.2018.176272 2018

[9] [9]

Architecting malleable mpi applications for priority-driven adaptive scheduling,

P. Lemarinier, K. Hasanov, S. Venugopal, and K. Katrinis, “Architecting malleable mpi applications for priority-driven adaptive scheduling,” inProceedings of the 23rd European MPI Users’ Group Meeting, ser. EuroMPI ’16, New York, NY , USA, 2016, p. 74–81. [Online]. Available: https://doi.org/10.1145/2966884.2966907

work page doi:10.1145/2966884.2966907 2016

[10] [10]

Autonomic malleability in iterative mpi applications,

A. C. Sena, F. S. Ribeiro, V . E. Rebello, A. P. Nascimento, and C. Boeres, “Autonomic malleability in iterative mpi applications,” in 2013 25th International Symposium on Computer Architecture and High Performance Computing, 2013, pp. 192–199. [Online]. Available: https://doi.org/10.1109/SBAC-PAD.2013.4

work page doi:10.1109/sbac-pad.2013.4 2013

[11] [11]

Reshape: A framework for dynamic resizing and scheduling of homogeneous applications in a parallel environment,

R. Sudarsan and C. J. Ribbens, “Reshape: A framework for dynamic resizing and scheduling of homogeneous applications in a parallel environment,” in2007 International Conference on Parallel Processing (ICPP 2007), 2007, pp. 44–44. [Online]. Available: https://doi.org/10.1109/ICPP.2007.73

work page doi:10.1109/icpp.2007.73 2007

[12] [12]

Dynamic malleability in iterative mpi applications,

K. E. Maghraoui, T. Desell, B. K. Szyma ´nski, and C. A. Varela, “Dynamic malleability in iterative mpi applications,”Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid ’07), pp. 591–598, 2007. [Online]. Available: https: //api.semanticscholar.org/CorpusID:2784887

2007

[13] [13]

Towards realizing the potential of malleable jobs,

A. Gupta, B. Acun, O. Sarood, and L. V . Kal ´e, “Towards realizing the potential of malleable jobs,” in2014 21st International Conference on High Performance Computing (HiPC), 2014, pp. 1–10. [Online]. Available: https://doi.org/10.1109/HiPC.2014.7116905

work page doi:10.1109/hipc.2014.7116905 2014

[14] [14]

Flex-mpi: an mpi extension for supporting dynamic load balancing on heterogeneous non-dedicated systems,

G. Mart ´ın, M.-C. Marinescu, D. E. Singh, and J. Carretero, “Flex-mpi: an mpi extension for supporting dynamic load balancing on heterogeneous non-dedicated systems,” inProceedings of the 19th International Conference on Parallel Processing, ser. Euro-Par’13, 2013, p. 138–149. [Online]. Available: https://doi.org/10.1007/978-3-642-40047-6 16

work page doi:10.1007/978-3-642-40047-6 2013

[15] [15]

Infrastructure and api extensions for elastic execution of mpi applications,

I. Compr ´es, A. Mo-Hellenbrand, M. Gerndt, and H.-J. Bungartz, “Infrastructure and api extensions for elastic execution of mpi applications,” inProceedings of the 23rd European MPI Users’ Group Meeting, ser. EuroMPI ’16, 2016, p. 82–97. [Online]. Available: https://doi.org/10.1145/2966884.2966917

work page doi:10.1145/2966884.2966917 2016

[16] [16]

Extending SLURM for dynamic resource-aware adaptive batch scheduling,

M. Chadha, J. John, and M. Gerndt, “Extending SLURM for dynamic resource-aware adaptive batch scheduling,” inIEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC), 2020, pp. 223–232. [Online]. Available: https://doi.org/10.1109/HiPC50609.2020.00036

work page doi:10.1109/hipc50609.2020.00036 2020

[17] [17]

Dynamic resource management for elastic scientific workflows using PMIx,

R. Bhattarai, H. Pritchard, and S. Ghafoor, “Dynamic resource management for elastic scientific workflows using PMIx,” inIEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2024. [Online]. Available: https://doi.ieeecomputersociety. org/10.1109/IPDPSW63119.2024.00131

work page doi:10.1109/ipdpsw63119.2024.00131 2024

[18] [18]

Dynamic resource allocation for efficient parallel CFD simulations,

G. Houzeaux, R. M. Badia, R. Borrell, D. Dosimont, J. Ejarque, M. Garcia-Gasulla, and V . L ´opez, “Dynamic resource allocation for efficient parallel CFD simulations,”Computers & Fluids, vol. 245, p. 105577, Sep. 2022. [Online]. Available: https://doi.org/10.1016/j.comp fluid.2022.105577

work page doi:10.1016/j.comp 2022

[19] [19]

Resource optimization with MPI process malleability for dynamic workloads in HPC clusters,

S. Iserte, I. Mart ´ın-´Alvarez, K. Rojek, J. I. Aliaga, M. Castillo, W. Folwarska, and A. J. Pe ˜na, “Resource optimization with MPI process malleability for dynamic workloads in HPC clusters,”Future Generation Computer Systems, p. 107949, 2025. [Online]. Available: https://doi.org/10.1016/j.future.2025.107949

work page doi:10.1016/j.future.2025.107949 2025

[20] [20]

Dynamic spawning of mpi processes applied to malleability,

I. Mart ´ın-´Alvarez, J. I. Aliaga, M. Castillo, S. Iserte, and R. Mayo, “Dynamic spawning of mpi processes applied to malleability,” International Journal of High Performance Computing Applications, vol. 38, no. 2, p. 69–93, Mar. 2024. [Online]. Available: https: //doi.org/10.1177/10943420231176527

work page doi:10.1177/10943420231176527 2024

[21] [21]

Design Principles of Dynamic Resource Management for High- Performance Parallel Programming Models,

D. Huber, M. Schreiber, M. Schulz, H. Pritchard, and D. Holmes, “Design Principles of Dynamic Resource Management for High- Performance Parallel Programming Models,” 2024. [Online]. Available: https://arxiv.org/abs/2403.17107

arXiv 2024

[22] [22]

DMR API: Improving cluster productivity by turning applications into malleable,

S. Iserte, R. Mayo, E. S. Quintana-Ort ´ı, V . Beltran, and A. J. Pe ˜na, “DMR API: Improving cluster productivity by turning applications into malleable,”Parallel Computing, vol. 78, pp. 54–66, 2018. [Online]. Available: https://doi.org/10.1016/j.parco.2018.07.006

work page doi:10.1016/j.parco.2018.07.006 2018

[23] [23]

Talp: A lightweight tool to unveil parallel efficiency of large-scale executions,

V . Lopez, G. Ramirez Miranda, and M. Garcia-Gasulla, “Talp: A lightweight tool to unveil parallel efficiency of large-scale executions,” inProceedings of the 2021 on Performance EngineeRing, Modelling, Analysis, and VisualizatiOn STrategy, ser. PERMA VOST ’21, 2021, p. 3–10. [Online]. Available: https://doi.org/10.1145/3452412.3462753

work page doi:10.1145/3452412.3462753 2021

[24] [24]

MPI Malleability Validation under Replayed Real-World HPC Conditions,

S. Iserte, M. Madon, G. Da Costa, J.-M. Pierson, and A. J. Pe ˜na, “MPI Malleability Validation under Replayed Real-World HPC Conditions,” Future Generation Computer Systems, p. 108305, Dec. 2025. [Online]. Available: https://doi.org/10.1016/j.future.2025.108305

work page doi:10.1016/j.future.2025.108305 2025

[25] [25]

Using an adaptive hpc runtime system to reconfigure the cache hierarchy,

E. Totoni, J. Torrellas, and L. V . Kale, “Using an adaptive hpc runtime system to reconfigure the cache hierarchy,” inSC ’14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2014, pp. 1047–1058. [Online]. Available: https://doi.org/10.1109/SC.2014.90

work page doi:10.1109/sc.2014.90 2014

[26] [26]

Predicting output performance of a petascale supercomputer,

B. Xie, Y . Huang, J. S. Chase, J. Y . Choi, S. Klasky, J. Lofstead, and S. Oral, “Predicting output performance of a petascale supercomputer,” inProceedings of the 26th International Symposium on High- Performance Parallel and Distributed Computing, 2017, p. 181–192. [Online]. Available: https://doi.org/10.1145/3078597.3078614

work page doi:10.1145/3078597.3078614 2017

[27] [27]

In: 2025 33rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)

J. Ekelund, S. Markidis, and I. Peng, “Boosting performance of iterative applications on gpus: Kernel batching with cuda graphs,” in33rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP), 2025, pp. 70–77. [Online]. Available: https://doi.org/10.1109/PDP66500.2025.00019

work page doi:10.1109/pdp66500.2025.00019 2025

[28] [28]

Malleable computational fluid dynamics simulations,

S. Iserte, G. Houzeaux, P. Sand ˚as, A. J. Pe ˜na, and M. Garcia-Gasulla, “Malleable computational fluid dynamics simulations,” inProceedings of the 36th Parallel CFD International Conference, Merida, Yucatan, Mexico, Nov. 2025, [In-press]

2025

[29] [29]

Alya: Multiphysics Engineering Simulation Towards Exascale,

M. V ´azquez, G. Houzeaux, S. Koric, A. Artigues, J. Aguado- Sierra, R. Ar ´ıs, D. Mira, H. Calmet, F. Cucchietti, H. Owen, A. Taha, E. D. Burness, J. M. Cela, and M. Valero, “Alya: Multiphysics Engineering Simulation Towards Exascale,”Journal of Computational Sciences, vol. 14, pp. 15–27, 2016. [Online]. Available: https://doi.org/10.1016/j.jocs.2015.12.007

work page doi:10.1016/j.jocs.2015.12.007 2016

[30] [30]

A low-dissipation finite element scheme for scale resolving simulations of turbulent flows,

O. Lehmkuhl, G. Houzeaux, H. Owen, G. Chrysokentis, and I. Rodriguez, “A low-dissipation finite element scheme for scale resolving simulations of turbulent flows,”Journal of Computational Physics, vol. 390, pp. 51–65, 2019. [Online]. Available: https: //doi.org/10.1016/j.jcp.2019.04.004

work page doi:10.1016/j.jcp.2019.04.004 2019

[31] [31]

Parallelization of 3d mpdata algorithm using many graphics processors,

K. Rojek and R. Wyrzykowski, “Parallelization of 3d mpdata algorithm using many graphics processors,” inParallel Computing Technologies, 2015, pp. 445–457. [Online]. Available: https://doi.org/10.1007/978-3 -319-21909-7 43

work page doi:10.1007/978-3 2015

[32] [32]

A study of the effect of process malleability in the energy efficiency on GPU-based clusters,

S. Iserte and K. Rojek, “A study of the effect of process malleability in the energy efficiency on GPU-based clusters,”Journal of Supercomputing, vol. 76, pp. 255–274, Oct. 2020. [Online]. Available: https://doi.org/10.1007/s11227-019-03034-x

work page doi:10.1007/s11227-019-03034-x 2020

[33] [33]

Dynamic resource management in HPC systems using dynamic processes with PSets,

D. Huber, S. Iserte, M. Schreiber, P.-F. Dutot, O. Richard, A. J. Pe ˜na, K. Gaddameedi, and T. Neckel, “Dynamic resource management in HPC systems using dynamic processes with PSets,” inProceedings of the 32nd IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC), Hyderabad, India, Dec. 2025. [Online]. Available: https:/...

work page doi:10.1109/hipc66333.2025.00036 2025

[34] [35]

A Layered Approach for Dynamic Resource Management in HPC,

H.-J. Bungartz, P.-F. Dutot, J. Fecht, K. Gaddameedi, D. Huber, S. Iserte, M. Minion, T. Neckel, A. Pe ˜na, O. Richard, M. Schreiber, M. Schulz, and V . Sch ¨uller, “A Layered Approach for Dynamic Resource Management in HPC,” inEuro-Par 2024: Parallel Processing Workshops: Euro-Par 2024 International Workshops, Madrid, Spain, August 26–30, 2024, Proceedin...

work page doi:10.1109/icppw.2014.15 2024

[35] [36]

Preparing mpich for exascale,

Y . Guo, K. Raffenetti, H. Zhou, P. Balaji, M. Si, A. Amer, S. Iwasaki, S. Seo, G. Congiu, R. Latham, L. Oden, T. Gillis, R. Zambre, K. Ouyang, C. Archer, W. Bland, J. Jose, S. Sur, H. Fujita, D. Durnov, M. Chuvelev, G. Zheng, A. Brooks, S. Thapaliya, T. Doodi, M. Garazan, S. Oyanagi, M. Snir, and R. Thakur, “Preparing mpich for exascale,”The Internationa...

work page doi:10.1177/10943420241311608 2025