Recognition: unknown
Evaluating SYCL as a Unified Programming Model for Heterogeneous Systems
Pith reviewed 2026-05-10 07:58 UTC · model grok-4.3
The pith
SYCL implementations show inconsistencies in memory management and parallelism that undermine cross-platform reliability.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that SYCL does not deliver consistent cross-platform behavior in its core abstractions, as USM and buffer-accessor memory models produce different results and NDRange and hierarchical parallelism models vary in efficiency and correctness across implementations, exposing limitations that affect reliability and usability.
What carries the argument
Direct comparison of SYCL's Unified Shared Memory (USM) versus buffer-accessor memory models and NDRange versus hierarchical kernel models, evaluated through application benchmarks and literature synthesis.
If this is right
- Developers cannot assume seamless portability and must validate code behavior on each target platform.
- Standardization efforts need to address variations in memory management to improve consistency.
- Productivity gains from SYCL are reduced by the need for platform-specific debugging and tuning.
- Future framework updates could prioritize unified runtime behavior to meet original design goals.
Where Pith is reading between the lines
- These findings suggest that unified models face inherent challenges when supporting diverse hardware without vendor-specific extensions.
- Similar evaluation methods could be applied to other portable frameworks to identify comparable gaps.
- Adoption in production HPC workloads may require supplementary tools for detecting implementation differences.
Load-bearing premise
The chosen benchmarks and illustrative examples are sufficient to represent the full range of real-world HPC application behaviors and SYCL usage patterns.
What would settle it
Running the paper's benchmark suite on additional SYCL implementations such as ComputeCpp and other vendors' hardware would show whether the reported inconsistencies appear consistently or remain limited to the tested Intel platforms.
Figures
read the original abstract
High-performance computing (HPC) applications are increasingly executed in heterogeneous environments, introducing new challenges for programming and software portability. SYCL has emerged as a leading model designed to simplify heterogeneous programming and make it more accessible to developers. Intended as a single-source, cross-platform parallel programming framework, SYCL promises portability, productivity, and performance across a variety of architectures. However, these goals have not been consistently defined or realized, leaving developers with varying expectations. This paper addresses this gap by evaluating SYCL from the perspective of application developers. We analyze whether SYCL meets essential criteria for cross-platform development, including code portability, development productivity, and runtime efficiency. Our evaluation draws on benchmarks and illustrative examples and focuses on SYCL's memory management and parallelism abstractions. We provide detailed comparisons between Unified Shared Memory (USM) and buffer-accessor approaches, as well as between NDRange and hierarchical kernel models. In addition to presenting our own benchmark results on Intel platforms, we synthesize findings from recent studies across multiple SYCL implementations and compilers. Our results expose key limitations and inconsistencies in current SYCL implementations and offer insights into the steps needed to improve the framework's reliability and cross-platform usability.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper evaluates SYCL as a single-source programming model for heterogeneous HPC systems by comparing USM versus buffer-accessor memory models and NDRange versus hierarchical kernel models, reports new benchmark results on Intel platforms, synthesizes findings from prior studies across implementations, and concludes that current SYCL versions exhibit limitations and inconsistencies in portability, productivity, and efficiency that must be addressed for reliable cross-platform use.
Significance. If the empirical comparisons and synthesized observations hold under broader validation, the work would provide concrete, developer-oriented guidance on SYCL's practical shortcomings, helping prioritize improvements in memory abstractions and kernel models that affect real heterogeneous workloads.
major comments (2)
- [§4] §4 (Benchmark Methodology) and associated tables/figures: the experimental setup lacks explicit hardware details, compiler versions, run counts, statistical significance tests, or exclusion criteria for outliers, making it impossible to determine whether the reported performance differences and inconsistencies are robust or sensitive to configuration choices.
- [Abstract, §5] Abstract and §5 (Discussion/Synthesis): the central claim that the results expose 'key limitations and inconsistencies' generalizable to SYCL relies on the assumption that the selected Intel-focused workloads and illustrative examples are representative of diverse HPC memory access patterns, parallelism granularities, and cross-device behaviors; no justification or coverage argument is provided for this representativeness.
minor comments (2)
- [§2] Notation for USM/buffer and NDRange/hierarchical variants is introduced without a consolidated table of abbreviations or a clear mapping to SYCL 2020/2023 specification sections.
- [§5] Several synthesized findings from prior studies are cited without page numbers or direct quotation of the original performance numbers, complicating verification.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on our manuscript. We address the two major comments point by point below and will revise the paper to strengthen the experimental description and the justification of our claims.
read point-by-point responses
-
Referee: [§4] §4 (Benchmark Methodology) and associated tables/figures: the experimental setup lacks explicit hardware details, compiler versions, run counts, statistical significance tests, or exclusion criteria for outliers, making it impossible to determine whether the reported performance differences and inconsistencies are robust or sensitive to configuration choices.
Authors: We agree that the experimental setup in §4 is insufficiently detailed. In the revised manuscript we will expand this section to specify the exact hardware platforms (Intel CPU and GPU models), compiler versions and build flags (including the Intel oneAPI DPC++ version), the number of repetitions for each benchmark, the statistical reporting method (means and standard deviations), and the outlier exclusion criteria. These additions will allow readers to evaluate the robustness of the observed differences between USM/buffer and NDRange/hierarchical models. revision: yes
-
Referee: [Abstract, §5] Abstract and §5 (Discussion/Synthesis): the central claim that the results expose 'key limitations and inconsistencies' generalizable to SYCL relies on the assumption that the selected Intel-focused workloads and illustrative examples are representative of diverse HPC memory access patterns, parallelism granularities, and cross-device behaviors; no justification or coverage argument is provided for this representativeness.
Authors: We acknowledge that the manuscript does not provide an explicit coverage argument for the representativeness of the chosen workloads. While the benchmarks illustrate common heterogeneous patterns and we synthesize results from multiple prior studies across SYCL implementations, a dedicated justification is missing. In the revision we will add a short paragraph in §5 that explains how the selected workloads span regular/irregular memory accesses and different parallelism granularities, referencing standard HPC benchmark suites, and we will explicitly state the Intel-centric scope of the new measurements together with the limitations this imposes on generalizability. revision: yes
Circularity Check
No circularity: empirical evaluation without derivations or self-referential reductions
full rationale
The paper is an empirical evaluation of SYCL using benchmarks, illustrative examples, and synthesis of recent studies. It contains no mathematical derivations, equations, parameter fitting, predictions, or uniqueness theorems. Claims about limitations in portability, productivity, and efficiency are supported by reported benchmark results on Intel platforms and external literature synthesis, without reducing to self-definition or fitted inputs by construction. No load-bearing self-citations of the enumerated kinds appear in the provided text.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Singularity: Sci- entific containers for mobility of compute
Kurtzer GM, Sochat V, Bauer MW. Singularity: Sci- entific containers for mobility of compute. PLoS One. 2017 May 11
2017
-
[2]
The Khronos SYCL Working Group, SYCL 2020 Specification (revision 7) , https://www.khronos.org/registry/SYCL/specs/sycl- 2020/pdf/sycl-2020.pdf
2020
-
[3]
The Khronos SYCL Working Group, SYCL 2020 Specification (revision 10) , https://registry.khronos.org/SYCL/specs/sycl- 2020/pdf/sycl-2020.pdf
2020
-
[4]
Khronos Group Landing page, https://www.khronos.org/sycl/
-
[5]
Codeplay Software Ltd, ComputeCpp Community Edition, 2021 , https://developer.codeplay.com
2021
-
[6]
Aksel Alpay, AdaptiveCPP, 2021,https://github.com/AdaptiveCpp
2021
-
[7]
A vailable: https://github.com/intel/llvm
Intel oneAPI DPC++/C++ Compiler (DPC++) , 2020, [Online]. A vailable: https://github.com/intel/llvm. 18
2020
-
[8]
(2020), Data Par- allel C++: Mastering DPC++ for Programming of Heterogeneous Systems using C++ and SYCL , Apress
Reinders, J., Ashbaugh, B., Brodman, J., Kinsner, M., Pennycook, J., and Tian, X. (2020), Data Par- allel C++: Mastering DPC++ for Programming of Heterogeneous Systems using C++ and SYCL , Apress
2020
-
[9]
(2023), Data Par- allel C++: Programming Accelerated Systems Using C++ and SYCL , Apress
Reinders, J., Ashbaugh, B., Brodman, J., Kinsner, M., Pennycook, J., and Tian, X. (2023), Data Par- allel C++: Programming Accelerated Systems Using C++ and SYCL , Apress
2023
-
[10]
(2021), Exploiting Co- execution with OneAPI: Heterogeneity from a Mod- ern Perspective, In: Sousa, L., Roma, N., Tomas, P
Nozal, R., Bosque, J.L. (2021), Exploiting Co- execution with OneAPI: Heterogeneity from a Mod- ern Perspective, In: Sousa, L., Roma, N., Tomas, P. (eds) Euro-Par 2021: Parallel Processing. Euro-Par
2021
-
[11]
Springer, Cham
Lecture Notes in Computer Science, vol 12820. Springer, Cham
-
[12]
Chien, I
S. Chien, I. Peng and S. Markidis, Performance Evaluation of Advanced Features in CUDA Unified Memory, IEEE/ACM Workshop on Memory Cen- tric High Performance Computing (MCHPC), Den- ver, CO, USA, 2019, pp. 50-57
2019
-
[13]
Jin and J
Z. Jin and J. S. Vetter, Evaluating Unified Memory Performance in HIP, 2022 IEEE International Par- allel and Distributed Processing Symposium Work- shops (IPDPSW), Lyon, France, 2022, pp. 562-568
2022
-
[14]
Joube et al
S. Joube et al. 2023 J. Phys.: Conf. Ser. 2438 012018
2023
-
[15]
Jarzabek, L., Czarnul, P., Performance evaluation of unified memory and dynamic parallelism for selected parallel CUDA applications , Journal of Supercom- puting 73, 5378-5401, 2017
2017
-
[16]
Gu, Y., Wu, W., Li, Y. and Chen, L., 2020 UVM- Bench: A Comprehensive Benchmark Suite for Re- searching Unified Virtual Memory in GPUs , arXiv preprint arXiv:2007.09822
-
[17]
N. Mijic and D. Davidovic, Benchmark DPC++ code and performance portability on hetero- geneous architectures, 2023 46th MIPRO ICT and Electronics Convention (MIPRO), Opatija, Croatia, 2023, pp. 331-337, doi: 10.23919/MIPRO57284.2023.10159832
-
[18]
Marcel Breyer, Alexander Van Craen, and Dirk Pfluger. 2023. Performance Evolution of Different SYCL Implementations based on the Parallel Least Squares Support Vector Machine Library. In Inter- national Workshop on OpenCL (IWOCL ’23), April 18-20, 2023, Cambridge, United Kingdom
2023
-
[19]
Wei-Chen Lin, Tom Deakin, and Simon McIntosh- Smith. 2021. On measuring the maturity of SYCL implementations by tracking historical performance improvements. In Workshop on OpenCL. 1-13
2021
-
[20]
Juan Fumero, Overall Performance of Unified Shared Memory Types with Level Zero on Intel Integrated GPUs, 2022, https://jjfumero.github.io/posts/2022/05/overall- performance-of-unified-shared-memory-level-zero/
2022
-
[21]
Intel Developer Cloud, www.intel.com/content/www/us/en/developer/
-
[22]
Benchmarking and Extending SYCL Hi- erarchical Parallelism,
T. Deakin, S. McIntosh-Smith, A. Alpay and V. Heuveline, "Benchmarking and Extending SYCL Hi- erarchical Parallelism," 2021 IEEE/ACM Interna- tional Workshop on Hierarchical Parallelism for Ex- ascale Computing (HiPar), St. Louis, MO, USA, 2021, pp. 10-19
2021
-
[23]
Marcel Breyer, Alexander Van Craen, and Dirk Pfluger. 2022. A Comparison of SYCL, OpenCL, CUDA, and OpenMP for Massively Parallel Sup- port Vector Machine Classification on Multi-Vendor Hardware. In International Workshop on OpenCL (IWOCL’22). Association for Computing Machinery, New York, NY, USA, Article 2, 1-12
2022
-
[24]
Aksel Alpay, Balint Soproni, Holger Wunsche,and Vincent Heuveline. 2022. Exploring the possibility of a hipSYCL-based implementation of oneAPI. In International Workshop on OpenCL (IWOCL’22), May 10-12, 2022, Bristol, ACM, New York, NY, USA, 12 pages. United Kingdom, United Kingdom
2022
-
[25]
arXiv: 2303.14006 ISBN: 9798350397390
Z. Jin and J. S. Vetter, "A Benchmark Suite for Im- proving Performance Portability of the SYCL Pro- gramming Model," 2023 IEEE International Sympo- sium on Performance Analysis of Systems and Soft- ware (ISPASS), Raleigh, NC, USA, 2023, pp. 325- 327, doi: 10.1109/ISPASS57527.2023.00041
-
[26]
Hammond and Timothy G
Jeff R. Hammond and Timothy G. Mattson. 2019. Evaluating Data Parallelism in C++ Using the Par- allel Research Kernels. In Proceedings of the Inter- national Workshop on OpenCL (Boston, MA, USA) (IWOCL’19). Association for Computing Machinery, New York, NY, USA, Article 14, 6 pages
2019
-
[27]
Marcel Breyer, Alexander Van Craen, and Dirk Pfluger. 2024. Performance Evaluation of SYCL’s Different Data Parallel Kernels. In Proceedings of the 12th International Workshop on OpenCL and SYCL (IWOCL ’24). Association for Computing Machinery, New York, NY, USA, Article 10, 1-4
2024
-
[28]
Meyer, J., Alpay, A., Hack, S., Froning, H., and Heuveline, V.: Implementation Techniques for SPMD Kernels on CPUs, in: Interna- tional Workshop on OpenCL, IWOCL ’23, ACM, https://doi.org/10.1145/3585341.3585342, 2023
-
[29]
Andersson, Klaus Steiniger, René Widera, Tapish Narwal, Michael Bussmann, and Stefano Markidis
A. Marowka, "On the Singularity of SYCL," 2025 IEEE International Parallel and Dis- tributed Processing Symposium Workshops 19 (IPDPSW), Milano, Italy, 2025, pp. 913-922, doi: 10.1109/IPDPSW66978.2025.00142. 20
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.