Modeling Task Mapping for Data-intensive Applications in Heterogeneous Systems

Anna Drewes; Hanna Geppert; Martin Wilhelm; Thilo Pionteck

arxiv: 2208.06321 · v1 · submitted 2022-08-12 · 💻 cs.DC · math.OC

Modeling Task Mapping for Data-intensive Applications in Heterogeneous Systems

Martin Wilhelm , Hanna Geppert , Anna Drewes , Thilo Pionteck This is my paper

Pith reviewed 2026-05-24 11:57 UTC · model grok-4.3

classification 💻 cs.DC math.OC

keywords task mappingheterogeneous systemsmixed-integer linear programmingcommunication modelingparallelizabilitystreamabilitydata-intensive applications

0 comments

The pith

A new model for task mapping in heterogeneous systems incorporates communication between devices along with device-specific parallelizability and streamability.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a model for assigning tasks to heterogeneous hardware like CPUs, GPUs and FPGAs. The model treats inter-device communication as a factor that shapes parallel execution and adds separate parameters for how readily each device can run tasks in parallel or as streams. The authors then encode the model in two mixed-integer linear programs that can be solved to produce concrete mappings. A reader would care because data-intensive workloads often run across mixed accelerators, and explicit handling of these effects can improve overall runtime predictions and assignments.

Core claim

The authors introduce a model that represents the task mapping problem by explicitly modeling communication costs between devices, the potential for parallel execution across devices, and the streamability and parallelizability specific to each device type. They demonstrate its utility by formulating two mixed-integer linear programs that optimize task assignments under these considerations.

What carries the argument

The task mapping model that incorporates communication influence on parallel execution and device-specific parallelizability and streamability.

If this is right

The model supports task mapping decisions in multiple phases of heterogeneous system design.
Two new mixed-integer linear programs become available that directly encode the model.
Communication is treated as an explicit constraint on achievable parallelism rather than an afterthought.
Device differences in streamability are represented as first-class parameters in the optimization.
pith_inferences=[

Load-bearing premise

The model correctly captures the dominant effects of inter-device communication, parallelizability, and streamability on overall execution time in a form that mixed-integer linear programming can use without essential loss of accuracy.

What would settle it

Run the two mixed-integer linear programs on a real heterogeneous platform with measured communication latencies and device throughput curves for a data-intensive workload, then compare predicted versus measured end-to-end runtimes for the produced mappings.

Figures

Figures reproduced from arXiv: 2208.06321 by Anna Drewes, Hanna Geppert, Martin Wilhelm, Thilo Pionteck.

**Figure 1.** Figure 1: Sample memoryaugmented task graph with three tasks, one source and one sink. In the hardware model, we assume that (1) each computation device is connected to (at least) one associated memory, (2) data transfer can only happen between memories (not between computation devices) and (3) the computation of a device is blocked by a memory transfer from or to the associated memory. Usually, the associated me… view at source ↗

**Figure 2.** Figure 2: Mappings found by the device-based (left) and the time-based (right) LP for a small sample graph. For each node, the parallelizability p and complexity factor c are given. At the edges and in the nodes, the time windows for transport and computation are annotated. For all tasks, the chosen input and output memory are identical, the corresponding nodes are omitted for readability. not restricted to follow a… view at source ↗

read the original abstract

We introduce a new model for the task mapping problem to aid in the systematic design of algorithms for heterogeneous systems including, but not limited to, CPUs, GPUs and FPGAs. A special focus is set on the communication between the devices, its influence on parallel execution, as well as on device-specific differences regarding parallelizability and streamability. We show how this model can be utilized in different system design phases and present two novel mixed-integer linear programs to demonstrate the usage of the model.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a task-mapping model that folds in communication, parallelizability, and streamability for heterogeneous hardware plus two MILP encodings, but supplies no hardware measurements to check whether the model predicts real runtimes.

read the letter

The main point is a model for assigning tasks to mixed CPU/GPU/FPGA systems that tries to account for communication between devices, how well each device can run tasks in parallel, and whether tasks can be streamed. The authors turn this into two mixed-integer linear programs and show them on small example graphs. That is the concrete output: a modeling framework and the optimization problems built from it. It is useful to see the factors written down in one place and encoded for a solver. The illustrations make clear how the model could be applied in early design stages. The soft spot is the complete lack of any check against measured execution times. The paper stops at the formulations and toy graphs; there is no comparison of predicted versus actual runtimes on real platforms, so it is impossible to tell whether the linearizations required for the MILPs preserve the effects the model claims to capture. Minor issues include the usual need to compare against existing schedulers, but the missing validation is the larger gap. This work is for people who build or tune mapping algorithms for heterogeneous systems and want a starting point for MILP-based approaches. A reader already working in that niche could extract the MILP structures and adapt them. It is worth sending to referees because the modeling choices are explicit enough that domain experts can evaluate whether they are worth extending, even though the current version needs experiments added.

Referee Report

2 major / 1 minor

Summary. The paper introduces a new model for the task mapping problem in heterogeneous systems (CPUs, GPUs, FPGAs) with emphasis on inter-device communication, device-specific parallelizability, and streamability. It encodes the model in two novel MILPs and illustrates their use on example task graphs to support systematic algorithm design across system phases.

Significance. If the model and its MILP encodings prove accurate, the work would supply a reusable formal framework for optimizing mappings while accounting for communication overheads and device heterogeneity, which is a recognized need in data-intensive heterogeneous computing. The explicit construction of two MILPs is a concrete strength that allows direct solver-based exploration.

major comments (2)

[Abstract] Abstract and model-definition sections: the central claim that the model 'correctly captures the dominant effects' of communication, parallelizability, and streamability and that these effects 'can be expressed in MILP form without losing essential accuracy' is unsupported by any comparison of MILP-derived execution times or mappings against measured wall-clock times on real heterogeneous hardware; without such data the linearization choices cannot be verified to preserve the claimed fidelity.
[MILP formulation and examples] Illustration of MILP usage (example task graphs): the manuscript shows mappings produced by the two MILPs but reports no quantitative check that the predicted makespans or communication costs align with actual execution on CPU/GPU/FPGA platforms, leaving open the possibility that abstraction choices required for MILP tractability omit or distort the very effects the model is asserted to capture.

minor comments (1)

[Model definition] Notation for device-specific parameters (parallelism factors, streamability) should be introduced with an explicit table or list of symbols to improve readability when the MILPs are presented.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the opportunity to respond to the referee's report. We address the major comments below regarding the lack of empirical validation on real hardware.

read point-by-point responses

Referee: [Abstract] Abstract and model-definition sections: the central claim that the model 'correctly captures the dominant effects' of communication, parallelizability, and streamability and that these effects 'can be expressed in MILP form without losing essential accuracy' is unsupported by any comparison of MILP-derived execution times or mappings against measured wall-clock times on real heterogeneous hardware; without such data the linearization choices cannot be verified to preserve the claimed fidelity.

Authors: We acknowledge that the manuscript provides no direct comparison between the MILP predictions and actual hardware measurements. The contribution of this work lies in the introduction of a task mapping model that explicitly incorporates inter-device communication, device-specific parallelizability, and streamability, along with two MILP formulations to enable its use in optimization. The illustrative examples demonstrate the model's application but are not intended as empirical validation. To align the claims with the presented evidence, we will revise the abstract to describe the model as one that accounts for these effects and can be encoded in MILP, removing assertions of 'correctly captures the dominant effects' and 'without losing essential accuracy' that imply unprovided validation. This change will be incorporated in the revised manuscript. revision: yes
Referee: [MILP formulation and examples] Illustration of MILP usage (example task graphs): the manuscript shows mappings produced by the two MILPs but reports no quantitative check that the predicted makespans or communication costs align with actual execution on CPU/GPU/FPGA platforms, leaving open the possibility that abstraction choices required for MILP tractability omit or distort the very effects the model is asserted to capture.

Authors: The referee is correct that no such quantitative alignment with real platform executions is reported. The examples serve to illustrate how the MILPs can be solved to obtain mappings under the model. Since the paper does not include hardware experiments, we cannot provide such checks here. We will revise the manuscript to explicitly state in the relevant sections that the examples are illustrative of the model's usage and that empirical validation against real executions is left for future work. This will help manage expectations regarding the fidelity of the linearizations. revision: yes

Circularity Check

0 steps flagged

Model introduced definitionally; no fitted predictions or self-citation chains

full rationale

The manuscript introduces a new model for task mapping and encodes it directly into two MILPs as an original construction. No equations, parameters, or predictions are fitted to data and then re-used as outputs; the central claims rest on the definitional adequacy of the model rather than any reduction to prior results. No load-bearing self-citations appear in the provided text, and the work contains no empirical prediction step that could collapse into its own inputs. The derivation is therefore self-contained as a modeling contribution.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities; the model itself is the central addition whose internal structure remains undisclosed.

pith-pipeline@v0.9.0 · 5605 in / 940 out tokens · 31029 ms · 2026-05-24T11:57:47.692117+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

11 extracted references · 11 canonical work pages

[1]

Bektur, G.: A multi-start iterated tabu search algorithm for the multi-resource agent bottleneck generalized assignment problem. Int. J. Opt. Contr. (IJOCTA) 10(1), 37–46 (Oct 2019). https://doi.org/10.11121/ijocta.01.2020.00796

work page doi:10.11121/ijocta.01.2020.00796 2019
[2]

In: 40th EUROMICRO Conference on Software Engineering and Advanced Applications, SEAA 2014, Verona, Italy, August 27-29, 2014

Campeanu, G., Carlson, J., Sentilles, S.: Component allocation optimization for heterogeneous CPU-GPU embedded systems. In: 40th EUROMICRO Conference on Software Engineering and Advanced Applications, SEAA 2014, Verona, Italy, August 27-29, 2014. pp. 229–236. IEEE Computer Society (2014). https://doi.org/ 10.1109/SEAA.2014.29

work page doi:10.1109/seaa.2014.29 2014
[3]

In: Proceedings of the IEEE Sympo- sium on Application Speciﬁc Processors, SASP 2008, June 8-9, 2008, Anaheim, California, USA

Che, S., Li, J., Sheaﬀer, J.W., Skadron, K., Lach, J.C.: Accelerating compute- intensive applications with gpus and fpgas. In: Proceedings of the IEEE Sympo- sium on Application Speciﬁc Processors, SASP 2008, June 8-9, 2008, Anaheim, California, USA. pp. 101–107. IEEE Computer Society (2008). https://doi.org/10. 1109/SASP.2008.4570793

work page arXiv 2008
[4]

Gurobi Optimization, LLC: Gurobi Optimizer Reference Manual (2022), https: //www.gurobi.com

work page 2022
[5]

International Journal of Production Research50(2), 309–324 (2012)

Özlem Karsu, Azizoğlu, M.: The multi-resource agent bottleneck generalised as- signment problem. International Journal of Production Research50(2), 309–324 (2012). https://doi.org/10.1080/00207543.2010.538745

work page doi:10.1080/00207543.2010.538745 2012
[6]

International Journal of Computer Science and Information Security14, 263 (03 2016)

Mhadhbi, I., Ben Othman, S., Ben Saoud, S.: A comprehensive survey on hard- ware/software partitioning process in co-design. International Journal of Computer Science and Information Security14, 263 (03 2016)

work page 2016
[7]

ACM Comput

Mittal, S., Vetter, J.S.: A survey of CPU-GPU heterogeneous computing tech- niques. ACM Comput. Surv. 47(4), 69:1–69:35 (2015). https://doi.org/10.1145/ 2788396

work page 2015
[8]

Niemann, R., Marwedel, P.: An algorithm for hardware/software partitioning us- ing mixed integer linear programming. Des. Autom. Embed. Syst.2(2), 165–193 (1997). https://doi.org/10.1023/A:1008832202436

work page doi:10.1023/a:1008832202436 1997
[9]

ACM Trans

Owaida, M., Falcão, G., Andrade, J., Antonopoulos, C.D., Bellas, N., Purnaprajna, M., Novo, D., Karakonstantis, G., Burg, A., Ienne, P.: Enhancing design space ex- ploration by extending CPU/GPU speciﬁcations onto fpgas. ACM Trans. Embed. Comput. Syst. 14(2), 33:1–33:23 (2015). https://doi.org/10.1145/2656207

work page doi:10.1145/2656207 2015
[10]

In: 28th IEEE International Conference on High Performance Computing, Data, and Analytics, HiPC 2021, Bengaluru, India, December 17-20, 2021

Wang,T.,Chang,W.,Srivastava,A.,Kannan,R.,Prasanna,V.K.:Montecarlotree search for task mapping onto heterogeneous platforms. In: 28th IEEE International Conference on High Performance Computing, Data, and Analytics, HiPC 2021, Bengaluru, India, December 17-20, 2021. pp. 63–70. IEEE (2021). https://doi.org/ 10.1109/HiPC53243.2021.00020

work page doi:10.1109/hipc53243.2021.00020 2021
[11]

In: 2020 IEEE High Performance Extreme Computing Conference, HPEC 2020, Waltham, MA, USA, September 22-24, 2020

Wang, T., Srivastava, A., Prasanna, V.K.: A framework for task mapping onto heterogeneous platforms. In: 2020 IEEE High Performance Extreme Computing Conference, HPEC 2020, Waltham, MA, USA, September 22-24, 2020. pp. 1–6. IEEE (2020). https://doi.org/10.1109/HPEC43674.2020.9286211

work page doi:10.1109/hpec43674.2020.9286211 2020

[1] [1]

Bektur, G.: A multi-start iterated tabu search algorithm for the multi-resource agent bottleneck generalized assignment problem. Int. J. Opt. Contr. (IJOCTA) 10(1), 37–46 (Oct 2019). https://doi.org/10.11121/ijocta.01.2020.00796

work page doi:10.11121/ijocta.01.2020.00796 2019

[2] [2]

In: 40th EUROMICRO Conference on Software Engineering and Advanced Applications, SEAA 2014, Verona, Italy, August 27-29, 2014

Campeanu, G., Carlson, J., Sentilles, S.: Component allocation optimization for heterogeneous CPU-GPU embedded systems. In: 40th EUROMICRO Conference on Software Engineering and Advanced Applications, SEAA 2014, Verona, Italy, August 27-29, 2014. pp. 229–236. IEEE Computer Society (2014). https://doi.org/ 10.1109/SEAA.2014.29

work page doi:10.1109/seaa.2014.29 2014

[3] [3]

In: Proceedings of the IEEE Sympo- sium on Application Speciﬁc Processors, SASP 2008, June 8-9, 2008, Anaheim, California, USA

Che, S., Li, J., Sheaﬀer, J.W., Skadron, K., Lach, J.C.: Accelerating compute- intensive applications with gpus and fpgas. In: Proceedings of the IEEE Sympo- sium on Application Speciﬁc Processors, SASP 2008, June 8-9, 2008, Anaheim, California, USA. pp. 101–107. IEEE Computer Society (2008). https://doi.org/10. 1109/SASP.2008.4570793

work page arXiv 2008

[4] [4]

Gurobi Optimization, LLC: Gurobi Optimizer Reference Manual (2022), https: //www.gurobi.com

work page 2022

[5] [5]

International Journal of Production Research50(2), 309–324 (2012)

Özlem Karsu, Azizoğlu, M.: The multi-resource agent bottleneck generalised as- signment problem. International Journal of Production Research50(2), 309–324 (2012). https://doi.org/10.1080/00207543.2010.538745

work page doi:10.1080/00207543.2010.538745 2012

[6] [6]

International Journal of Computer Science and Information Security14, 263 (03 2016)

Mhadhbi, I., Ben Othman, S., Ben Saoud, S.: A comprehensive survey on hard- ware/software partitioning process in co-design. International Journal of Computer Science and Information Security14, 263 (03 2016)

work page 2016

[7] [7]

ACM Comput

Mittal, S., Vetter, J.S.: A survey of CPU-GPU heterogeneous computing tech- niques. ACM Comput. Surv. 47(4), 69:1–69:35 (2015). https://doi.org/10.1145/ 2788396

work page 2015

[8] [8]

Niemann, R., Marwedel, P.: An algorithm for hardware/software partitioning us- ing mixed integer linear programming. Des. Autom. Embed. Syst.2(2), 165–193 (1997). https://doi.org/10.1023/A:1008832202436

work page doi:10.1023/a:1008832202436 1997

[9] [9]

ACM Trans

Owaida, M., Falcão, G., Andrade, J., Antonopoulos, C.D., Bellas, N., Purnaprajna, M., Novo, D., Karakonstantis, G., Burg, A., Ienne, P.: Enhancing design space ex- ploration by extending CPU/GPU speciﬁcations onto fpgas. ACM Trans. Embed. Comput. Syst. 14(2), 33:1–33:23 (2015). https://doi.org/10.1145/2656207

work page doi:10.1145/2656207 2015

[10] [10]

In: 28th IEEE International Conference on High Performance Computing, Data, and Analytics, HiPC 2021, Bengaluru, India, December 17-20, 2021

Wang,T.,Chang,W.,Srivastava,A.,Kannan,R.,Prasanna,V.K.:Montecarlotree search for task mapping onto heterogeneous platforms. In: 28th IEEE International Conference on High Performance Computing, Data, and Analytics, HiPC 2021, Bengaluru, India, December 17-20, 2021. pp. 63–70. IEEE (2021). https://doi.org/ 10.1109/HiPC53243.2021.00020

work page doi:10.1109/hipc53243.2021.00020 2021

[11] [11]

In: 2020 IEEE High Performance Extreme Computing Conference, HPEC 2020, Waltham, MA, USA, September 22-24, 2020

Wang, T., Srivastava, A., Prasanna, V.K.: A framework for task mapping onto heterogeneous platforms. In: 2020 IEEE High Performance Extreme Computing Conference, HPEC 2020, Waltham, MA, USA, September 22-24, 2020. pp. 1–6. IEEE (2020). https://doi.org/10.1109/HPEC43674.2020.9286211

work page doi:10.1109/hpec43674.2020.9286211 2020