AQPIM performs in-memory product quantization of activations for LLMs on PIM hardware, reducing GPU-CPU communication by 90-98.5% and delivering 3.4x speedup over prior PIM methods.
Hardware architecture and software stack for pim based on commercial dram technology: Industrial product
2 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 2representative citing papers
UMDAM introduces a column-major tile-based data layout and configurable DRAM mapping to enable efficient NPU-PIM co-execution for LLM inference, reducing TTFT by up to 3.0x and TTLT by 2.18x on OPT models without added memory overhead or bandwidth loss.
citing papers explorer
-
AQPIM: Breaking the PIM Capacity Wall for LLMs with In-Memory Activation Quantization
AQPIM performs in-memory product quantization of activations for LLMs on PIM hardware, reducing GPU-CPU communication by 90-98.5% and delivering 3.4x speedup over prior PIM methods.
-
UMDAM: A Unified Data Layout and DRAM Address Mapping for Heterogenous NPU-PIM
UMDAM introduces a column-major tile-based data layout and configurable DRAM mapping to enable efficient NPU-PIM co-execution for LLM inference, reducing TTFT by up to 3.0x and TTLT by 2.18x on OPT models without added memory overhead or bandwidth loss.