A new partitioning algorithm that provably load-balances arbitrary sparse tensor algebra expressions by generalizing parallel merging to multi-operand, multi-dimensional hierarchical structures, implemented in a compiler framework.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 3verdicts
UNVERDICTED 3roles
background 3polarities
background 3representative citing papers
Affinity Tailor improves per-CPU throughput by 12% on chiplet systems and 3% on non-chiplet systems over Linux CFS by using dynamic compact affinity hints derived from online demand estimates.
Duon eliminates TLB shootdown and cache invalidation costs during page migration in flat-address hybrid memory systems by updating mappings in-place, delivering 3.87% IPC gains over prior methods.
citing papers explorer
-
Partitioning Unstructured Sparse Tensor Algebra for Load-Balanced Parallel Execution
A new partitioning algorithm that provably load-balances arbitrary sparse tensor algebra expressions by generalizing parallel merging to multi-operand, multi-dimensional hierarchical structures, implemented in a compiler framework.
-
Affinity Tailor: Dynamic Locality-Aware Scheduling at Scale
Affinity Tailor improves per-CPU throughput by 12% on chiplet systems and 3% on non-chiplet systems over Linux CFS by using dynamic compact affinity hints derived from online demand estimates.
-
Efficient Page Migration in Hybrid Memory Systems
Duon eliminates TLB shootdown and cache invalidation costs during page migration in flat-address hybrid memory systems by updating mappings in-place, delivering 3.87% IPC gains over prior methods.