TCM finds provably optimal DNN accelerator mappings by pruning the search space up to 32 orders of magnitude with a new dataplacement concept, delivering 1.2-6.5x better energy-delay-product in 17 seconds instead of hours.
Oliver, Benjamin Lienhard, and Swamit Tannu
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 2polarities
background 2representative citing papers
CHIA introduces a framework for building and deploying agentic AI co-design flows as CHIA loops with tool nodes, reliability mechanisms, and five case-study demonstrations.
FFM finds optimal fused mappings for tensor accelerators over 10,000 times faster than prior mappers while cutting energy-delay product by up to 1.8x versus hand-tuned designs.
Mambalaya delivers 4.9x prefill and 1.9x generation speedups on Mamba layers over prior accelerators by systematically fusing inter-Einsum operations.
LUNA achieves up to 10.95x area reduction and 30% lower latency for qubit readout using integrator-based preprocessing and LogicNet LUT synthesis with minimal fidelity loss.
citing papers explorer
-
LUNA: LUT-Based Neural Architecture for Fast and Low-Cost Qubit Readout
LUNA achieves up to 10.95x area reduction and 30% lower latency for qubit readout using integrator-based preprocessing and LogicNet LUT synthesis with minimal fidelity loss.
- Photonic Quantum Computing on Spin Memory Architecture with Tree-Encoded Fusion