pith. sign in

hub Canonical reference

Splitwise: Efficient generative llm inference using phase splitting

Canonical reference. 83% of citing Pith papers cite this work as background.

20 Pith papers citing it
Background 83% of classified citations

hub tools

citation-role summary

background 5 method 1

citation-polarity summary

clear filters

representative citing papers

Beyond Prediction: Tail-Aware Scheduling for LLM Inference

cs.LG · 2026-06-16 · unverdicted · novelty 7.0

Presents a distribution-aware scheduling framework for LLM inference that reduces P99 TTLT by 35-50% and TTFT by 34-47% versus SRPT with perfect length knowledge using statistical signals instead of predictions.

The Price of Anarchy in Disaggregated Inference

cs.AR · 2026-06-11 · unverdicted · novelty 7.0

Disaggregated inference is modeled as three games whose price of anarchy rises at GPU saturation; an adaptive controller reduces the empirical PoA-hat by up to 3.1x on real clusters at modest throughput cost.

Sparse Prefix Caching for Hybrid and Recurrent LLM Serving

cs.LG · 2026-04-17 · unverdicted · novelty 7.0

Sparse prefix caching via dynamic programming for optimal checkpoint placement under overlap distributions improves the Pareto frontier for recurrent and hybrid LLM serving on shared-prefix data.

A Survey on Efficient Inference for Large Language Models

cs.CL · 2024-04-22 · accept · novelty 3.0

The paper surveys techniques to speed up and reduce the resource needs of LLM inference, organized by data-level, model-level, and system-level changes, with comparative experiments on representative methods.

citing papers explorer

Showing 3 of 3 citing papers after filters.