A 28nm 20.9-137.2 tops/w output-stationary sram compute-in-memory macro featuring dynamic look-ahead zero weight skipping and runtime partial sum quantization

· 2025

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

FusionCIM: Accelerating LLM Inference with Fusion-Driven Computing-in-Memory Architecture

cs.AR · 2026-04-28 · unverdicted · novelty 6.0

FusionCIM is a fusion-driven CIM accelerator for LLM inference that maps QKT to IP-CIM and PV to OP-CIM, uses QO-stationary dataflow, and applies pattern-aware online softmax, delivering up to 3.86x energy savings and 1.98x speedup on LLaMA-3 at 29.4 TOPS/W.

citing papers explorer

Showing 1 of 1 citing paper.

FusionCIM: Accelerating LLM Inference with Fusion-Driven Computing-in-Memory Architecture cs.AR · 2026-04-28 · unverdicted · none · ref 19
FusionCIM is a fusion-driven CIM accelerator for LLM inference that maps QKT to IP-CIM and PV to OP-CIM, uses QO-stationary dataflow, and applies pattern-aware online softmax, delivering up to 3.86x energy savings and 1.98x speedup on LLaMA-3 at 29.4 TOPS/W.

A 28nm 20.9-137.2 tops/w output-stationary sram compute-in-memory macro featuring dynamic look-ahead zero weight skipping and runtime partial sum quantization

fields

years

verdicts

representative citing papers

citing papers explorer