FlexGen: High-Throughput Generative Inference of Large Language Models with a Single

Ying Sheng, Lianmin Zheng, Binhang Yuan, Zhuohan Li, Max Ryabinin, Beidi Chen · 2023

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Training-Inference Consistent Segmented Execution for Long-Context LLMs

cs.CL · 2026-05-12 · conditional · novelty 6.0

A training-inference consistent segmented execution framework for long-context LLMs matches full-context performance with substantially lower peak memory at very long lengths.

citing papers explorer

Showing 1 of 1 citing paper.

Training-Inference Consistent Segmented Execution for Long-Context LLMs cs.CL · 2026-05-12 · conditional · none · ref 17
A training-inference consistent segmented execution framework for long-context LLMs matches full-context performance with substantially lower peak memory at very long lengths.

FlexGen: High-Throughput Generative Inference of Large Language Models with a Single

fields

years

verdicts

representative citing papers

citing papers explorer