Sandwich delivers 2.01x average end-to-end speedup and up to 3.4x latency reduction for CPU LLM serving via phase-wise hot-switching, TopoTree hardware abstraction, and fast-start dynamic kernel generation.
InProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2(La Jolla, CA, USA)(ASPLOS ’24)
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AR 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Sandwich: Joint Configuration Search and Hot-Switching for Efficient CPU LLM Serving
Sandwich delivers 2.01x average end-to-end speedup and up to 3.4x latency reduction for CPU LLM serving via phase-wise hot-switching, TopoTree hardware abstraction, and fast-start dynamic kernel generation.