IRIS-14B is the first LLM trained explicitly for GIMPLE-to-LLVM IR translation and outperforms much larger models by up to 44 percentage points on real-world C code.
hub
Mlir: Scaling compiler infrastructure for domain specific computation
13 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
roles
background 4polarities
background 4representative citing papers
Mat2Boundary treats boundary conditions as sparse matrix-vector products and uses multi-stage compilation with polyhedral analysis to generate efficient matrix-free kernels and communication schedules for distributed block-structured PDE solvers.
An MLIR-native NumPy-like DSL with a new dialect-agnostic type checker and parallel-first lowering to a dataflow dialect, shown on weather modeling and CFD workloads in Fortran.
Free-variable sets and a nesting tree can replace dominance relations in SSA for higher-order programs, improving precision without requiring explicit control-flow graphs.
A new abstract interpretation algorithm enables sound optimistic analysis of e-graphs during equality saturation, unifying it with non-destructive rewriting and improving precision on cyclic SSA programs.
LEO performs cross-vendor backward slicing from stalled GPU instructions to attribute root causes to source code, enabling optimizations that produce geometric-mean speedups of 1.73-1.82x on 21 workloads.
EquivFusion unifies equivalence checking across hardware design levels by lowering PyTorch, C/C++, Chisel, Verilog, and netlists via MLIR into SMT-LIB, BTOR2, and AIGER formats.
KEET uses LLM agents to generate data-grounded natural language explanations of performance issues in GPU kernels from Nsight Compute profiles and shows these improve downstream LLM-based optimization tasks.
Aquas delivers a holistic hardware-software co-optimization framework on MLIR that models memory interfaces with cache effects and uses an e-graph retargetable compiler, achieving up to 15.61x speedup with 14.5% area overhead across four domains.
Error analysis and cost estimator for recasting floating-point matrix multiplication as accumulated integer products on mixed-precision hardware.
AutoLALA automatically generates symbolic formulas for reuse distance and data movement complexity in affine loop programs using polyhedral lowering and Barvinok counting.