pith. sign in

arxiv: 2208.06321 · v1 · submitted 2022-08-12 · 💻 cs.DC · math.OC

Modeling Task Mapping for Data-intensive Applications in Heterogeneous Systems

Pith reviewed 2026-05-24 11:57 UTC · model grok-4.3

classification 💻 cs.DC math.OC
keywords task mappingheterogeneous systemsmixed-integer linear programmingcommunication modelingparallelizabilitystreamabilitydata-intensive applications
0
0 comments X

The pith

A new model for task mapping in heterogeneous systems incorporates communication between devices along with device-specific parallelizability and streamability.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a model for assigning tasks to heterogeneous hardware like CPUs, GPUs and FPGAs. The model treats inter-device communication as a factor that shapes parallel execution and adds separate parameters for how readily each device can run tasks in parallel or as streams. The authors then encode the model in two mixed-integer linear programs that can be solved to produce concrete mappings. A reader would care because data-intensive workloads often run across mixed accelerators, and explicit handling of these effects can improve overall runtime predictions and assignments.

Core claim

The authors introduce a model that represents the task mapping problem by explicitly modeling communication costs between devices, the potential for parallel execution across devices, and the streamability and parallelizability specific to each device type. They demonstrate its utility by formulating two mixed-integer linear programs that optimize task assignments under these considerations.

What carries the argument

The task mapping model that incorporates communication influence on parallel execution and device-specific parallelizability and streamability.

If this is right

  • The model supports task mapping decisions in multiple phases of heterogeneous system design.
  • Two new mixed-integer linear programs become available that directly encode the model.
  • Communication is treated as an explicit constraint on achievable parallelism rather than an afterthought.
  • Device differences in streamability are represented as first-class parameters in the optimization.
  • pith_inferences=[

Load-bearing premise

The model correctly captures the dominant effects of inter-device communication, parallelizability, and streamability on overall execution time in a form that mixed-integer linear programming can use without essential loss of accuracy.

What would settle it

Run the two mixed-integer linear programs on a real heterogeneous platform with measured communication latencies and device throughput curves for a data-intensive workload, then compare predicted versus measured end-to-end runtimes for the produced mappings.

Figures

Figures reproduced from arXiv: 2208.06321 by Anna Drewes, Hanna Geppert, Martin Wilhelm, Thilo Pionteck.

Figure 1
Figure 1. Figure 1: Sample memory￾augmented task graph with three tasks, one source and one sink. In the hardware model, we assume that (1) each computation device is connected to (at least) one associated memory, (2) data transfer can only happen between memories (not between computa￾tion devices) and (3) the computation of a device is blocked by a memory transfer from or to the as￾sociated memory. Usually, the associated me… view at source ↗
Figure 2
Figure 2. Figure 2: Mappings found by the device-based (left) and the time-based (right) LP for a small sample graph. For each node, the parallelizability p and complexity factor c are given. At the edges and in the nodes, the time windows for transport and computation are annotated. For all tasks, the chosen input and output memory are identical, the corresponding nodes are omitted for readability. not restricted to follow a… view at source ↗
read the original abstract

We introduce a new model for the task mapping problem to aid in the systematic design of algorithms for heterogeneous systems including, but not limited to, CPUs, GPUs and FPGAs. A special focus is set on the communication between the devices, its influence on parallel execution, as well as on device-specific differences regarding parallelizability and streamability. We show how this model can be utilized in different system design phases and present two novel mixed-integer linear programs to demonstrate the usage of the model.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces a new model for the task mapping problem in heterogeneous systems (CPUs, GPUs, FPGAs) with emphasis on inter-device communication, device-specific parallelizability, and streamability. It encodes the model in two novel MILPs and illustrates their use on example task graphs to support systematic algorithm design across system phases.

Significance. If the model and its MILP encodings prove accurate, the work would supply a reusable formal framework for optimizing mappings while accounting for communication overheads and device heterogeneity, which is a recognized need in data-intensive heterogeneous computing. The explicit construction of two MILPs is a concrete strength that allows direct solver-based exploration.

major comments (2)
  1. [Abstract] Abstract and model-definition sections: the central claim that the model 'correctly captures the dominant effects' of communication, parallelizability, and streamability and that these effects 'can be expressed in MILP form without losing essential accuracy' is unsupported by any comparison of MILP-derived execution times or mappings against measured wall-clock times on real heterogeneous hardware; without such data the linearization choices cannot be verified to preserve the claimed fidelity.
  2. [MILP formulation and examples] Illustration of MILP usage (example task graphs): the manuscript shows mappings produced by the two MILPs but reports no quantitative check that the predicted makespans or communication costs align with actual execution on CPU/GPU/FPGA platforms, leaving open the possibility that abstraction choices required for MILP tractability omit or distort the very effects the model is asserted to capture.
minor comments (1)
  1. [Model definition] Notation for device-specific parameters (parallelism factors, streamability) should be introduced with an explicit table or list of symbols to improve readability when the MILPs are presented.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the opportunity to respond to the referee's report. We address the major comments below regarding the lack of empirical validation on real hardware.

read point-by-point responses
  1. Referee: [Abstract] Abstract and model-definition sections: the central claim that the model 'correctly captures the dominant effects' of communication, parallelizability, and streamability and that these effects 'can be expressed in MILP form without losing essential accuracy' is unsupported by any comparison of MILP-derived execution times or mappings against measured wall-clock times on real heterogeneous hardware; without such data the linearization choices cannot be verified to preserve the claimed fidelity.

    Authors: We acknowledge that the manuscript provides no direct comparison between the MILP predictions and actual hardware measurements. The contribution of this work lies in the introduction of a task mapping model that explicitly incorporates inter-device communication, device-specific parallelizability, and streamability, along with two MILP formulations to enable its use in optimization. The illustrative examples demonstrate the model's application but are not intended as empirical validation. To align the claims with the presented evidence, we will revise the abstract to describe the model as one that accounts for these effects and can be encoded in MILP, removing assertions of 'correctly captures the dominant effects' and 'without losing essential accuracy' that imply unprovided validation. This change will be incorporated in the revised manuscript. revision: yes

  2. Referee: [MILP formulation and examples] Illustration of MILP usage (example task graphs): the manuscript shows mappings produced by the two MILPs but reports no quantitative check that the predicted makespans or communication costs align with actual execution on CPU/GPU/FPGA platforms, leaving open the possibility that abstraction choices required for MILP tractability omit or distort the very effects the model is asserted to capture.

    Authors: The referee is correct that no such quantitative alignment with real platform executions is reported. The examples serve to illustrate how the MILPs can be solved to obtain mappings under the model. Since the paper does not include hardware experiments, we cannot provide such checks here. We will revise the manuscript to explicitly state in the relevant sections that the examples are illustrative of the model's usage and that empirical validation against real executions is left for future work. This will help manage expectations regarding the fidelity of the linearizations. revision: yes

Circularity Check

0 steps flagged

Model introduced definitionally; no fitted predictions or self-citation chains

full rationale

The manuscript introduces a new model for task mapping and encodes it directly into two MILPs as an original construction. No equations, parameters, or predictions are fitted to data and then re-used as outputs; the central claims rest on the definitional adequacy of the model rather than any reduction to prior results. No load-bearing self-citations appear in the provided text, and the work contains no empirical prediction step that could collapse into its own inputs. The derivation is therefore self-contained as a modeling contribution.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities; the model itself is the central addition whose internal structure remains undisclosed.

pith-pipeline@v0.9.0 · 5605 in / 940 out tokens · 31029 ms · 2026-05-24T11:57:47.692117+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

11 extracted references · 11 canonical work pages

  1. [1]

    Bektur, G.: A multi-start iterated tabu search algorithm for the multi-resource agent bottleneck generalized assignment problem. Int. J. Opt. Contr. (IJOCTA) 10(1), 37–46 (Oct 2019). https://doi.org/10.11121/ijocta.01.2020.00796

  2. [2]

    In: 40th EUROMICRO Conference on Software Engineering and Advanced Applications, SEAA 2014, Verona, Italy, August 27-29, 2014

    Campeanu, G., Carlson, J., Sentilles, S.: Component allocation optimization for heterogeneous CPU-GPU embedded systems. In: 40th EUROMICRO Conference on Software Engineering and Advanced Applications, SEAA 2014, Verona, Italy, August 27-29, 2014. pp. 229–236. IEEE Computer Society (2014). https://doi.org/ 10.1109/SEAA.2014.29

  3. [3]

    In: Proceedings of the IEEE Sympo- sium on Application Specific Processors, SASP 2008, June 8-9, 2008, Anaheim, California, USA

    Che, S., Li, J., Sheaffer, J.W., Skadron, K., Lach, J.C.: Accelerating compute- intensive applications with gpus and fpgas. In: Proceedings of the IEEE Sympo- sium on Application Specific Processors, SASP 2008, June 8-9, 2008, Anaheim, California, USA. pp. 101–107. IEEE Computer Society (2008). https://doi.org/10. 1109/SASP.2008.4570793

  4. [4]

    Gurobi Optimization, LLC: Gurobi Optimizer Reference Manual (2022), https: //www.gurobi.com

  5. [5]

    International Journal of Production Research50(2), 309–324 (2012)

    Özlem Karsu, Azizoğlu, M.: The multi-resource agent bottleneck generalised as- signment problem. International Journal of Production Research50(2), 309–324 (2012). https://doi.org/10.1080/00207543.2010.538745

  6. [6]

    International Journal of Computer Science and Information Security14, 263 (03 2016)

    Mhadhbi, I., Ben Othman, S., Ben Saoud, S.: A comprehensive survey on hard- ware/software partitioning process in co-design. International Journal of Computer Science and Information Security14, 263 (03 2016)

  7. [7]

    ACM Comput

    Mittal, S., Vetter, J.S.: A survey of CPU-GPU heterogeneous computing tech- niques. ACM Comput. Surv. 47(4), 69:1–69:35 (2015). https://doi.org/10.1145/ 2788396

  8. [8]

    Niemann, R., Marwedel, P.: An algorithm for hardware/software partitioning us- ing mixed integer linear programming. Des. Autom. Embed. Syst.2(2), 165–193 (1997). https://doi.org/10.1023/A:1008832202436

  9. [9]

    ACM Trans

    Owaida, M., Falcão, G., Andrade, J., Antonopoulos, C.D., Bellas, N., Purnaprajna, M., Novo, D., Karakonstantis, G., Burg, A., Ienne, P.: Enhancing design space ex- ploration by extending CPU/GPU specifications onto fpgas. ACM Trans. Embed. Comput. Syst. 14(2), 33:1–33:23 (2015). https://doi.org/10.1145/2656207

  10. [10]

    In: 28th IEEE International Conference on High Performance Computing, Data, and Analytics, HiPC 2021, Bengaluru, India, December 17-20, 2021

    Wang,T.,Chang,W.,Srivastava,A.,Kannan,R.,Prasanna,V.K.:Montecarlotree search for task mapping onto heterogeneous platforms. In: 28th IEEE International Conference on High Performance Computing, Data, and Analytics, HiPC 2021, Bengaluru, India, December 17-20, 2021. pp. 63–70. IEEE (2021). https://doi.org/ 10.1109/HiPC53243.2021.00020

  11. [11]

    In: 2020 IEEE High Performance Extreme Computing Conference, HPEC 2020, Waltham, MA, USA, September 22-24, 2020

    Wang, T., Srivastava, A., Prasanna, V.K.: A framework for task mapping onto heterogeneous platforms. In: 2020 IEEE High Performance Extreme Computing Conference, HPEC 2020, Waltham, MA, USA, September 22-24, 2020. pp. 1–6. IEEE (2020). https://doi.org/10.1109/HPEC43674.2020.9286211