TCM finds provably optimal DNN accelerator mappings by pruning the search space up to 32 orders of magnitude with a new dataplacement concept, delivering 1.2-6.5x better energy-delay-product in 17 seconds instead of hours.
Oliver, Benjamin Lienhard, and Swamit Tannu
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 2polarities
background 2representative citing papers
CHIA introduces a framework for building and deploying agentic AI co-design flows as CHIA loops with tool nodes, reliability mechanisms, and five case-study demonstrations.
FFM finds optimal fused mappings for tensor accelerators over 10,000 times faster than prior mappers while cutting energy-delay product by up to 1.8x versus hand-tuned designs.
Mambalaya delivers 4.9x prefill and 1.9x generation speedups on Mamba layers over prior accelerators by systematically fusing inter-Einsum operations.
LUNA achieves up to 10.95x area reduction and 30% lower latency for qubit readout using integrator-based preprocessing and LogicNet LUT synthesis with minimal fidelity loss.
citing papers explorer
-
The Turbo-Charged Mapper: Fast and Optimal Mapping for Energy-efficient and Low-latency Accelerator Design
TCM finds provably optimal DNN accelerator mappings by pruning the search space up to 32 orders of magnitude with a new dataplacement concept, delivering 1.2-6.5x better energy-delay-product in 17 seconds instead of hours.
-
CHIA: An open-source framework for principled, agentic AI-driven hardware/software co-design research
CHIA introduces a framework for building and deploying agentic AI co-design flows as CHIA loops with tool nodes, reliability mechanisms, and five case-study demonstrations.
-
Fast and Fusiest: An Optimal Fusion-Aware Mapper for Accelerator Design
FFM finds optimal fused mappings for tensor accelerators over 10,000 times faster than prior mappers while cutting energy-delay product by up to 1.8x versus hand-tuned designs.
-
Mambalaya: Einsum-Based Fusion Optimizations on State-Space Models
Mambalaya delivers 4.9x prefill and 1.9x generation speedups on Mamba layers over prior accelerators by systematically fusing inter-Einsum operations.
-
LUNA: LUT-Based Neural Architecture for Fast and Low-Cost Qubit Readout
LUNA achieves up to 10.95x area reduction and 30% lower latency for qubit readout using integrator-based preprocessing and LogicNet LUT synthesis with minimal fidelity loss.
- Photonic Quantum Computing on Spin Memory Architecture with Tree-Encoded Fusion