{"total":13,"items":[{"citing_arxiv_id":"2605.14780","ref_index":33,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Mat2Boundary: Treating User-Defined Boundary Condition as SpMV for Distributed PDE Solvers on Block-Structured Grids","primary_cat":"cs.PL","submitted_at":"2026-05-14T12:49:09+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Mat2Boundary treats boundary conditions as sparse matrix-vector products and uses multi-stage compilation with polyhedral analysis to generate efficient matrix-free kernels and communication schedules for distributed block-structured PDE solvers.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.08247","ref_index":19,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"LLM Translation of Compiler Intermediate Representation","primary_cat":"cs.PL","submitted_at":"2026-05-07T13:22:23+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":8.0,"formal_verification":"none","one_line_summary":"IRIS-14B is the first LLM trained explicitly for GIMPLE-to-LLVM IR translation and outperforms much larger models by up to 44 percentage points on real-world C code.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"tions in Fortran; (b) specific frontends, e.g., the Rust language is richly supported in LLVM, while legacy Modula-2 codebases are primarily supported via GCC-based frontends; (c) specific backends, e.g., most embedded targets are only supported in GCC; (d) mature optimizations for cross-polination workflows, e.g., domain specific optimizations from MLIR [19]/LLVM and software pipelining from GCC; and (e) sharing compiler-specific tooling, e.g., using LLVM Alive2 [23] in projects relying on GCC. An IR-to-IR translator would enable optimization analysis and verification, optimization cross- pollination, and the integration of GCC's and LLVM's frontends and backends. Despite serving a similar role in the compilation pipeline, GIM-"},{"citing_arxiv_id":"2605.04467","ref_index":3,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"KEET: Explaining Performance of GPU Kernels Using LLM Agents","primary_cat":"cs.PF","submitted_at":"2026-05-06T03:47:19+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"KEET uses LLM agents to generate data-grounded natural language explanations of performance issues in GPU kernels from Nsight Compute profiles and shows these improve downstream LLM-based optimization tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.03353","ref_index":25,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"SkCC: Portable and Secure Skill Compilation for Cross-Framework LLM Agents","primary_cat":"cs.CR","submitted_at":"2026-05-05T04:15:48+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"optimization, and target code generation. The critical architectural insight is the role of the IR: by introducing a unified intermediate layer, compilers decouple frontend language parsing from backend code generation, reducing the m×n support problem to m+n [22-24]. Security optimization at compile time, such as stack canary insertion and bounds checking [ 25], further demonstrates that compilers can enforce safety properties before code executes. 2.2 Related Work and Challenges Having established the foundational concepts, we now analyze recent work and identify limitations that motivate our approach. Format Sensitivity and Skill Retrieval.LLM performance is highly sensitive to prompt formatting, with up to 40% variation from format changes alone [ 13]."},{"citing_arxiv_id":"2604.20032","ref_index":6,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"LEO: Tracing GPU Stall Root Causes via Cross-Vendor Backward Slicing","primary_cat":"cs.DC","submitted_at":"2026-04-21T22:23:55+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"LEO performs cross-vendor backward slicing from stalled GPU instructions to attribute root causes to source code, enabling optimizations that produce geometric-mean speedups of 1.73-1.82x on 21 workloads.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Compute [2], AMD's rocprofv3 [3], and Intel's VTune [4]/u- nitrace [5], can report stall distributions measured using PC sampling per GPU instruction. However, these tools showwhere stalls occur but notwhy: they present stall breakdowns without identifying which earlier instructions actually caused the observed stalls. Research tools such as GPA (GPU Performance Advisor) [6] pioneered backward slicing for GPUs, but GPA supports only NVIDIA GPUs and cannot trace memory access dependencies through synchronization instructions such as AMD'ss_waitcnt. To date, no tool has provided instruction-level root-cause analysis for GPUs from multiple vendors. Each vendor ex- poses a different PC-sampling interface, stall taxonomy, and"},{"citing_arxiv_id":"2604.19906","ref_index":14,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Demonstrating a Future for MLIR-native DSL Compilers on a NumPy-like Example","primary_cat":"cs.PL","submitted_at":"2026-04-21T18:30:11+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"An MLIR-native NumPy-like DSL with a new dialect-agnostic type checker and parallel-first lowering to a dataflow dialect, shown on weather modeling and CFD workloads in Fortran.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.16571","ref_index":23,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"EquivFusion: Unifying Hardware Equivalence Checking from Algorithms to Netlists via MLIR","primary_cat":"cs.AR","submitted_at":"2026-04-17T12:09:54+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"EquivFusion unifies equivalence checking across hardware design levels by lowering PyTorch, C/C++, Chisel, Verilog, and netlists via MLIR into SMT-LIB, BTOR2, and AIGER formats.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.09961","ref_index":21,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"SSA without Dominance for Higher-Order Programs","primary_cat":"cs.PL","submitted_at":"2026-04-10T23:54:00+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Free-variable sets and a nesting tree can replace dominance relations in SSA for higher-order programs, improving precision without requiring explicit control-flow graphs.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"𝜆-calculi handle higher-order programs naturally through block nesting: variables are explicitly scoped by their syntactic struc- ture. However, this syntactic nesting can be too rigid for program transformations. For example, 𝛽-reducing the application f a in theOCamlcode in Fig. 1 duplicates g, although g does not depend on f's variablex. For this reason, some IRs, such asThorin[ 21], \"CPS soup\" inGuile[ 39], and MimIR [22], have abolished explicit scoping to simplify program transformations. However, these Authors' Contact Information: Roland Leißa, University of Göttingen, Göttingen, Germany, roland.leissa@cs.uni-goettingen. de; Johannes Griebler, University of Göttingen, Göttingen, Germany, j.griebler@stud.uni-goettingen.de."},{"citing_arxiv_id":"2604.05066","ref_index":12,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"AutoLALA: Automatic Loop Algebraic Locality Analysis for AI and HPC Kernels","primary_cat":"cs.PL","submitted_at":"2026-04-06T18:12:39+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"AutoLALA automatically generates symbolic formulas for reuse distance and data movement complexity in affine loop programs using polyhedral lowering and Barvinok counting.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2511.22267","ref_index":12,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Aquas: Enhancing Domain Specialization through Holistic Hardware-Software Co-Optimization based on MLIR","primary_cat":"cs.AR","submitted_at":"2025-11-27T09:43:38+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Aquas delivers a holistic hardware-software co-optimization framework on MLIR that models memory interfaces with cache effects and uses an e-graph retargetable compiler, achieving up to 15.61x speedup with 14.5% area overhead across four domains.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2511.20782","ref_index":23,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Optimism in Equality Saturation","primary_cat":"cs.PL","submitted_at":"2025-11-25T19:19:31+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"A new abstract interpretation algorithm enables sound optimistic analysis of e-graphs during equality saturation, unifying it with non-destructive rewriting and improving precision on cyclic SSA programs.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2506.11277","ref_index":26,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Analysis of Floating-Point Matrix Multiplication Computed via Integer Arithmetic","primary_cat":"math.NA","submitted_at":"2025-06-12T20:33:50+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Error analysis and cost estimator for recasting floating-point matrix multiplication as accumulated integer products on mixed-precision hardware.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2404.11591","ref_index":15,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"The EDGE Language: Extended General Einsums for Graph Algorithms","primary_cat":"cs.DS","submitted_at":"2024-04-17T17:42:48+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}