Characterizing Real-World Bugs in Tile Programs for Automated Bug Detection
Pith reviewed 2026-05-20 04:34 UTC · model grok-4.3
The pith
Tile program code generation bugs follow patterns tied to input shapes and compilation stages.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
This paper presents the first systematic study of tile-program code generation bugs. We curate 401 bug reports from GitHub and identify 301 tile-program codegen bugs for analysis, categorizing the root causes, symptoms, input patterns, test oracles that trigger these bugs, and the strategies used to fix bugs. Our study provides foundational insights for building debugging, testing, and repair tools tailored to tile-based compiler infrastructures.
What carries the argument
Manual curation and categorization of 301 tile-program bugs drawn from GitHub, organized by root causes in multi-stage compilation, symptoms, input shapes, data types, backend targets, and developer fix strategies.
If this is right
- The identified input patterns can be used to generate more effective test cases for tile compilers.
- Common symptoms point to places where silent errors are most likely to appear in production GPU kernels.
- Fix strategies show that repair tools must incorporate knowledge of tile abstractions and pipeline stages.
- Categorization of test oracles suggests concrete checkers that current general-purpose compiler testers lack.
Where Pith is reading between the lines
- Automated detection systems could encode the reported root-cause patterns as static or dynamic checks inside tile compilers.
- The same curation method could be applied to bugs in other high-performance DSLs that use multi-stage code generation.
- The categories supply a benchmark set that future testing tools for tile programs can be measured against.
Load-bearing premise
The 401 GitHub bug reports are representative of real-world tile-program code generation bugs and the manual labels for root causes and symptoms are reliable even without full reproduction environments or original developer intent.
What would settle it
A new collection of tile-program bugs from additional repositories or production runs that shows substantially different distributions of root causes or input triggers would falsify the reported categories.
Figures
read the original abstract
Tile-based programming frameworks are increasingly adopted to write high-performance GPU kernels in domains such as deep learning and scientific computing. While these frameworks enhance productivity and hardware utilization, their multi-stage compilation pipelines introduce distinct code generation bugs that are tightly coupled to input shapes, data types, and backend targets. These bugs often manifest as silent correctness or performance issues, making them difficult to detect using existing compiler testing tools. Additionally, the unique programming conventions of tile domain-specific languages complicate root cause identification, while fixing such bugs demands specialized knowledge of tile abstractions and compilation pipelines. Despite the growing adoption of tile-based systems, their code generation bugs remain largely unexplored. This paper presents the first systematic study of tile-program code generation bugs. We curate 401 bug reports from GitHub and identify 301 tile-program codegen bugs for analysis, categorizing the root causes, symptoms, input patterns, test oracles that trigger these bugs, and the strategies used to fix bugs. Our study provides foundational insights for building debugging, testing, and repair tools tailored to tile-based compiler infrastructures.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents the first systematic study of tile-program code generation bugs in tile-based programming frameworks for high-performance GPU kernels. The authors curate 401 bug reports from GitHub, filter to 301 tile-program codegen bugs, and categorize root causes, symptoms, input patterns, test oracles that trigger the bugs, and fix strategies to provide insights for building specialized debugging, testing, and repair tools.
Significance. If the curation process and manual categorization prove reliable and representative, this empirical study would deliver valuable foundational data on an underexplored class of silent, shape- and backend-coupled bugs that evade standard compiler testing. It could directly inform tool-building for tile-based compiler infrastructures in deep learning and scientific computing. The work earns credit for grounding analysis in real GitHub reports rather than synthetic cases and for producing a multi-dimensional taxonomy (root causes through fixes) that is actionable for practitioners.
major comments (2)
- [Methodology / Data Collection] The description of curating 401 GitHub reports and identifying 301 codegen bugs provides no details on selection criteria, inclusion/exclusion rules, inter-rater agreement, or validation against reproduction environments. This is load-bearing for the central claim that the resulting taxonomy reflects real-world tile-program bugs, as GitHub issues are self-selected and often lack full context or developer intent.
- [Categorization and Analysis] Manual categorization of root causes, symptoms, and oracles from titles, descriptions, and comments alone risks misattribution (e.g., shape-dependent silent errors labeled as performance issues). Without reported measures of labeling reliability or access to reproduction scripts, the categories' correctness cannot be assessed, weakening the insights offered for automated bug detection.
minor comments (2)
- [Introduction] Clarify the exact definition of 'tile-program codegen bug' versus usage error or unrelated defect early in the paper to aid reader interpretation of the 301 cases.
- [Results] Consider adding a table or figure summarizing the distribution of categories (e.g., percentage of bugs per root cause) for quicker overview.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address the major comments point by point below and outline the revisions we will make to strengthen the transparency of our methodology and analysis.
read point-by-point responses
-
Referee: [Methodology / Data Collection] The description of curating 401 GitHub reports and identifying 301 codegen bugs provides no details on selection criteria, inclusion/exclusion rules, inter-rater agreement, or validation against reproduction environments. This is load-bearing for the central claim that the resulting taxonomy reflects real-world tile-program bugs, as GitHub issues are self-selected and often lack full context or developer intent.
Authors: We agree that greater transparency in the curation process is required. In the revised manuscript we will expand the Methodology section with explicit inclusion/exclusion criteria used to select the 301 tile-program codegen bugs from the initial 401 reports, along with inter-rater agreement statistics obtained when multiple authors independently reviewed the issues. We will also clarify the degree to which we could validate reports against available code snippets and developer comments, while acknowledging that complete reproduction environments are not provided in most GitHub issues. revision: yes
-
Referee: [Categorization and Analysis] Manual categorization of root causes, symptoms, and oracles from titles, descriptions, and comments alone risks misattribution (e.g., shape-dependent silent errors labeled as performance issues). Without reported measures of labeling reliability or access to reproduction scripts, the categories' correctness cannot be assessed, weakening the insights offered for automated bug detection.
Authors: We acknowledge the inherent limitations of text-based manual categorization. The revised version will include a new subsection that defines each category with concrete examples and reports inter-annotator agreement measures for the labeling process. We will explicitly discuss the unavailability of reproduction scripts for many reports as a limitation and describe how we reduced misattribution risk by cross-referencing issue comments and code fragments. These changes will allow readers to better evaluate the taxonomy's reliability for guiding automated bug detection tools. revision: yes
Circularity Check
No circularity in empirical bug categorization study
full rationale
This paper is an empirical study that curates 401 GitHub bug reports, filters to 301 tile-program codegen cases, and performs manual categorization of root causes, symptoms, input patterns, oracles, and fixes. It contains no mathematical derivations, equations, fitted parameters, or self-referential definitions. All claims rest on external data sources and direct inspection rather than any internal reduction or self-citation chain, rendering the analysis self-contained with no load-bearing circular steps.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption GitHub bug reports provide a representative and unbiased sample of real-world tile-program code generation bugs
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We curate 401 bug reports from GitHub and identify 301 tile-program codegen bugs for analysis, categorizing the root causes, symptoms, input patterns, test oracles that trigger these bugs, and the strategies used to fix bugs.
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanabsolute_floor_iff_bare_distinguishability unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Table 2. The taxonomy of bug causes … Type and Operator Bugs … 147 (48.84%)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
S. M. Mojahidul Ahsan, Tamzidul Hoque, Md Sakib Hasan, Mrittika Chowdhury, and Anurag Dhungel. 2025. Hardware accelerators for artificial intelligence. InAI-Enabled Electronic Circuit and System Design: From Ideation to Utilization. Springer
work page 2025
-
[2]
AIKernelResearch. 2025. TileBug: project repository. https://github.com/AIKernelResearch/TileCodegenBugStudy. github.io
work page 2025
-
[3]
AIKernelResearch. 2025. TileCodegenBug: project website. https://aikernelresearch.github.io/TileCodegenBugStudy. github.io/
work page 2025
-
[4]
Apache. 2025. apache/TVM: open deep learning compiler stack for CPU, GPU and specialized accelerators. https: //github.com/apache/tvm
work page 2025
-
[5]
Apache Authors. 2023. [Bug][MetaSchedule] failed to run apply_trace generated by print(sch.trace) for int8 conv2d workload #14112. https://github.com/apache/tvm/issues/14112
work page 2023
- [6]
-
[7]
Lukas Bernhard, Nico Schiller, Moritz Schloegel, Nils Bars, and Thorsten Holz. 2024. DarthShader: fuzzing We- bGPU shader translators & compilers. In Proceedings of the 2024 ACM SIGSAC Conference on Computer and Communications Security (CCS)
work page 2024
-
[8]
Adam Betts, Nathan Chong, Alastair Donaldson, Shaz Qadeer, and Paul Thomson. 2012. GPUVerify: a verifier for GPU kernels. In Proceedings of the 27th ACM International Conference on Object Oriented Programming, Systems, Languages, and Applications (OOPSLA)
work page 2012
-
[9]
Michael Boyer, Kevin Skadron, and Westley Weimer. 2008. Automated dynamic analysis of CUDA programs. In Proceedings of the Third Workshop on Software Tools for MultiCore Systems (STMCS)
work page 2008
-
[10]
Junjie Chen, Yihua Liang, Qingchao Shen, Jiajun Jiang, and Shuochuan Li. 2023. Toward Understanding Deep Learning Framework Bugs. ACM Transactions on Software Engineering and Methodology (TOSEM) (2023)
work page 2023
-
[11]
Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Meghan Cowan, Haichen Shen, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018. TVM: an automated end-to-end optimizing compiler for deep learning. In Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI)
work page 2018
-
[12]
Edoardo Cittadini, Mauro Marinoni, and Giorgio Buttazzo. 2025. A hardware accelerator to support deep learning processor units in real-time image processing. Engineering Applications of Artificial Intelligence (2025)
work page 2025
-
[13]
Anthony Di Franco, Hui Guo, and Cindy Rubio-González. 2017. A comprehensive study of real-world numerical bug characteristics. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE)
work page 2017
-
[14]
Donaldson, Hugues Evrard, Andrei Lascu, and Paul Thomson
Alastair F. Donaldson, Hugues Evrard, Andrei Lascu, and Paul Thomson. 2017. Automated testing of graphics shader compilers. Proceedings of the ACM on Programming Languages (PACMPL), OOPSLA (2017)
work page 2017
-
[15]
Ariel Eizenberg, Yuanfeng Peng, Toma Pigli, William Mansky, and Joseph Devietti. 2017. BARRACUDA: binary-level analysis of runtime races in CUDA programs. InProceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)
work page 2017
-
[16]
Karine Even-Mendoza, Arindam Sharma, Alastair F. Donaldson, and Cristian Cadar. 2023. GrayC: greybox fuzzing of compilers and analysers for C. In Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA)
work page 2023
-
[17]
Geoff Gerfin and Vyas Venkataraman. 2012. Debugging experience with CUDA-GDB and CUDA-Memcheck. InGPU Technology Conference (GTC)
work page 2012
-
[18]
GitHub, Inc. 2008. GitHub. https://github.com
work page 2008
-
[19]
Ganesh Gopalakrishnan, Ignacio Laguna, Ang Li, Pavel Panchekha, Cindy Rubio-González, and Zachary Tatlock
-
[20]
Guarding numerics amidst rising heterogeneity. In Proceedings of the 5th IEEE/ACM International Workshop on Software Correctness for HPC Applications (Correctness). , Vol. 1, No. 1, Article . Publication date: May 2026. Characterizing Real-World Bugs in Tile Programs for Automated Bug Detection 21
work page 2026
-
[21]
Qianyu Guo, Xiaofei Xie, Yi Li, Xiaoyu Zhang, Yang Liu, Xiaohong Li, and Chao Shen. 2020. Audee: Automated Testing for Deep Learning Frameworks. InProceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering (ASE)
work page 2020
-
[22]
Ahan Gupta, Yueming Yuan, Devansh Jain, Yuhao Ge, David Aponte, Yanqi Zhou, and Charith Mendis. 2025. SPLAT: a framework for optimised GPU code-generation for SParse reguLar ATtention. Proceedings of the ACM on Programming Languages (PACMPL), OOPSLA (2025)
work page 2025
-
[23]
Halide. 2025. halide/Halide: a language for fast, portable data-parallel computation. https://github.com/halide/Halide
work page 2025
-
[24]
Halide Authors. 2025. undef prunes select branch incorrectly #8667. https://github.com/halide/Halide/issues/8667
work page 2025
-
[25]
Md Johirul Islam, Giang Nguyen, Rangeet Pan, and Hridesh Rajan. 2019. A comprehensive study on deep learning bug characteristics. In Proceedings of the 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE)
work page 2019
-
[26]
Mohammad Majharul Islam and Abdullah Muzahid. 2018. Bugaroo: exposing memory model bugs in many-core systems. In Proceedings of the 29th IEEE International Symposium on Software Reliability Engineering (ISSRE)
work page 2018
-
[27]
Bo Jiang, Xiaoyan Wang, Wing Kwong Chan, T. H. Tse, Na Li, Yongfeng Yin, and Zhenyu Zhang. 2020. CUDA- smith: a fuzzer for CUDA compilers. In Proceedings of the 44th IEEE Annual Computers, Software, and Applications Conference (COMPSAC)
work page 2020
-
[28]
Eliska Kloberdanz, Kyle G. Kloberdanz, and Wei Le. 2022. DeepStability: a study of unstable numerical methods and their solutions in deep learning. In Proceedings of the 44th IEEE/ACM International Conference on Software Engineering (ICSE)
work page 2022
-
[29]
Ignacio Laguna. 2019. FPChecker: detecting floating-point exceptions in GPU applications. In Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering (ASE)
work page 2019
-
[30]
Ignacio Laguna and Ganesh Gopalakrishnan. 2022. Finding inputs that trigger floating-point exceptions in GPUs via Bayesian optimization. In Proceedings of the IEEE/ACM International Conference for High Performance Computing, Networking, Storage and Analysis (SC)
work page 2022
-
[31]
Ignacio Laguna, Xinyi Li, and Ganesh Gopalakrishnan. 2022. BinFPE: accurate floating-point exception detection for GPU applications. In Proceedings of the 11th ACM SIGPLAN International Workshop on the State of the Art in Program Analysis (SOAP)
work page 2022
-
[32]
Chris Lattner. 2008. LLVM and Clang: next generation compiler technology. In Proceedings of the BSD Conference (BSDCan)
work page 2008
-
[33]
Guodong Li, Peng Li, Geof Sawaya, Ganesh Gopalakrishnan, Indradeep Ghosh, and Sreeranga P. Rajan. 2012. GKLEE: concolic verification and test generation for GPUs. InProceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP)
work page 2012
-
[34]
Jianling Li, Shangzhan Li, Zhenye Gao, Qi Shi, Yuxuan Li, Zefan Wang, Jiacheng Huang, Haojie Wang, Jianrong Wang, Xu Han, Zhiyuan Liu, and Maosong Sun. 2025. TritonBench: benchmarking large language model capabilities for generating Triton operators. In Findings of the Association for Computational Linguistics (ACL Findings)
work page 2025
-
[35]
Jiashi Li and Shengyu Liu. 2025. FlashMLA: efficient multi-head latent attention kernels. https://github.com/deepseek- ai/FlashMLA
work page 2025
-
[36]
Wentao Li, Jianhua Sun, and Hao Chen. 2019. Detecting undefined behaviors in CUDA C. IEEE Access (2019)
work page 2019
-
[37]
Xinyi Li, Ignacio Laguna, Bo Fang, Katarzyna Swirydowicz, Ang Li, and Ganesh Gopalakrishnan. 2023. Design and evaluation of GPU-FPX: a low-overhead tool for floating-point exception detection in NVIDIA GPUs. In Proceedings of the 32nd ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC)
work page 2023
- [38]
-
[39]
Ben Limpanukorn, Jiyuan Wang, Hong Jin Kang, Eric Zitong Zhou, and Miryung Kim. 2025. Fuzzing MLIR com- pilers with custom mutation synthesis. In Proceedings of the 47th IEEE/ACM International Conference on Software Engineering (ICSE)
work page 2025
-
[40]
Vsevolod Livinskii, Dmitry Babokin, and John Regehr. 2020. Random testing for C and C++ compilers with YARPGen. Proceedings of the ACM on Programming Languages (PACMPL), OOPSLA (2020)
work page 2020
- [41]
-
[42]
Hasan Mohsin. 2022. WGSLsmith: a random generator of WebGPU shader programs. Master’s thesis, Imperial College London
work page 2022
-
[43]
John Nickolls. 2007. GPU parallel computing architecture and CUDA programming model. In Proceedings of the 19th IEEE Hot Chips Symposium (HCS)
work page 2007
-
[44]
NVIDIA. 2021. cuBLAS: basic linear algebra on NVIDIA GPUs. https://developer.nvidia.com/cublas
work page 2021
-
[45]
NVIDIA. 2021. NVIDIA cuDNN. https://developer.nvidia.com/cudnn
work page 2021
-
[46]
NVIDIA. 2025. NVIDIA graphics cards. https://www.nvidia.com/en-us/geforce/graphics-cards/ , Vol. 1, No. 1, Article . Publication date: May 2026. 22 Rathnasuriya and Song, et al
work page 2025
-
[47]
NVIDIA. 2025. NVIDIA/cuda-tile: an MLIR-based intermediate representation for tile-based CUDA kernel optimization. https://github.com/NVIDIA/cuda-tile
work page 2025
-
[48]
NVIDIA. 2025. NVIDIA/warp: a Python framework for accelerated simulation, data generation and spatial computing. https://github.com/NVIDIA/warp
work page 2025
-
[49]
NVIDIA. 2025. Triton Inference Server. https://github.com/triton-inference-server/server
work page 2025
-
[50]
NVIDIA Authors. 2025. [BUG] tile operations produce unexpected results #688. https://github.com/NVIDIA/warp/ issues/688
work page 2025
-
[51]
OpenAI. 2021. Introducing Triton: open-source GPU programming for neural networks. https://openai.com/index/ triton/
work page 2021
-
[52]
OpenXLA. 2025. openxla/XLA: a machine learning compiler for GPUs, CPUs, and ML accelerators. https://github. com/openxla/xla
work page 2025
-
[53]
KernelBench: Can LLMs Write Efficient GPU Kernels?
Anne Ouyang, Simon Guo, Simran Arora, Alex L. Zhang, William Hu, Christopher Ré, and Azalia Mirhoseini. 2025. KernelBench: can LLMs write efficient GPU kernels? arXiv preprint arXiv:2502.10517 (2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[54]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Köpf, Edward Yang, Zach DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: an imperative style, high-pe...
work page 2019
-
[55]
Hung Viet Pham, Thibaud Lutellier, Weizhen Qi, and Lin Tan. 2019. CRADLE: cross-backend validation to detect and localize bugs in deep learning libraries. In Proceedings of the 41st IEEE/ACM International Conference on Software Engineering (ICSE)
work page 2019
-
[56]
PyTorch. 2025. pytorch/pytorch: tensors and dynamic neural networks in Python with strong GPU acceleration. https://github.com/pytorch/pytorch
work page 2025
-
[57]
PyTorch Authors. 2024. Failure in generating a kernel with 3 tile groups #141121. https://github.com/pytorch/pytorch/ issues/141121
work page 2024
-
[58]
PyTorch Authors. 2025. flex_attention + dynamic=True with large batch or heads causes Triton error [CUDA]: invalid argument #157018. https://github.com/pytorch/pytorch/issues/157018
work page 2025
-
[59]
Ravishka Rathnasuriya, Nidhi Majoju, Zihe Song, and Wei Yang. 2025. An investigation on numerical bugs in GPU programs towards automated bug detection. Proceedings of the ACM on Software Engineering (PACMSE), ISSTA (2025)
work page 2025
-
[60]
Jason Sanders and Edward Kandrot. 2010. CUDA by example: an introduction to general-purpose GPU programming. Addison-Wesley Professional
work page 2010
-
[61]
Qingchao Shen, Haoyang Ma, Junjie Chen, Yongqiang Tian, Shing-Chi Cheung, and Xiang Chen. 2021. A compre- hensive study of deep learning compiler bugs. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE)
work page 2021
-
[62]
Cristina Silvano, Daniele Ielmini, Fabrizio Ferrandi, Leandro Fiorin, Serena Curzel, Luca Benini, Francesco Conti, Angelo Garofalo, Cristian Zambelli, Enrico Calore, Sebastiano Schifano, Maurizio Palesi, Giuseppe Ascia, Davide Patti, Nicola Petra, Davide De Caro, Luciano Lavagno, Teodoro Urso, Valeria Cardellini, Gian Carlo Cardarilli, Robert Birke, and S...
work page 2025
- [63]
-
[64]
SPCL. 2025. spcl/dace: DaCe – data centric parallel programming. https://github.com/spcl/dace
work page 2025
-
[65]
F., Arora, S., Singhal, A., Fu, D
Benjamin F. Spector, Simran Arora, Aaryan Singhal, Daniel Y. Fu, and Christopher Ré. 2024. ThunderKittens: simple, fast, and adorable AI kernels. arXiv preprint arXiv:2410.20399 (2024)
-
[66]
Chenyao Suo, Jianrong Wang, Yongjia Wang, Jiajun Jiang, Qingchao Shen, and Junjie Chen. 2025. DESIL: detecting silent bugs in MLIR compiler infrastructure. Proceedings of the ACM on Programming Languages (PACMPL), OOPSLA (2025)
work page 2025
-
[67]
Tile-AI. 2025. tile-ai/tilelang: a domain-specific language for high-performance GPU/CPU/accelerator kernels. https: //github.com/tile-ai/tilelang
work page 2025
-
[68]
Tile-AI Authors. 2025. [Bug] compilation error for mma on NVIDIA Hopper GPU #101. https://github.com/tile- ai/tilelang/issues/101
work page 2025
-
[69]
Tile-AI Authors. 2025. [Bug] compile/“cached” still not loading cached kernel for example in example_mha_bwd #313. https://github.com/tile-ai/tilelang/issues/313
work page 2025
-
[70]
Tile-AI Authors. 2025. [BUG] incorrect __sync_thread_partial placement in generated kernel code #1604. https: //github.com/tile-ai/tilelang/issues/1604
work page 2025
-
[71]
Tile-AI Authors. 2025. [BUG Report] encounter dead lock when implementing deepgemm with 8 warps on Hopper #359. https://github.com/tile-ai/tilelang/issues/359 , Vol. 1, No. 1, Article . Publication date: May 2026. Characterizing Real-World Bugs in Tile Programs for Automated Bug Detection 23
work page 2025
-
[72]
Philippe Tillet, Hsiang-Tsung Kung, and David Cox. 2019. Triton: an intermediate language and compiler for tiled neural network computations. InProceedings of the 3rd ACM SIGPLAN International Workshopon Machine Learning and Programming Languages (MAPL)
work page 2019
-
[73]
Devesh Tiwari, Saurabh Gupta, James Rogers, Don Maxwell, Paolo Rech, Sudharshan Vazhkudai, Daniel Oliveira, Dave Londo, Nathan DeBardeleben, Philippe Navaux, Luigi Carro, and Arthur Bland. 2015. Understanding GPU errors on large-scale HPC systems and the implications for system design and operation. In Proceedings of the 21st IEEE International Symposium ...
work page 2015
-
[74]
Triton-Lang. 2025. triton-lang/triton: development repository for the Triton language and compiler. https://github. com/triton-lang/triton
work page 2025
-
[75]
Triton-Lang Authors. 2022. Segfault in dds_matmul #443. https://github.com/triton-lang/triton/issues/443
work page 2022
-
[76]
Triton-Lang Authors. 2023. Segmentation fault with matmul + argmax #1846. https://github.com/triton-lang/triton/ issues/1846
work page 2023
-
[77]
Triton-Lang Authors. 2023. WSMaterialization generates invalid IR – modifies module’s num-warps field without modifying tensor layouts #2658. https://github.com/triton-lang/triton/issues/2658
work page 2023
-
[78]
Triton-Lang Authors. 2024. Assertion failure in linear layouts when num_warps = 8, but passes with num_warps = 4 #5265. https://github.com/triton-lang/triton/issues/5265
work page 2024
-
[79]
Triton-Lang Authors. 2025. AMD ReorderInstruction pass will reorder the global_load ahead of local_store and break the local_prefetch logic which will miss match TritonAMDGPULowerInstructionSchedHints::createLocalPrefetchSchedule code logic #6750. https://github.com/triton-lang/triton/issues/6750
work page 2025
- [80]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.