← back to paper
arxiv: 2606.08761 · 2 revisions
APEX4: Efficient Pure W4A4 LLM Inference via Intra-SM Compute Rebalancing