HexiSeq optimizes sequence and head partitioning across mixed GPUs to improve long-context LLM training throughput by up to 1.72x in simulations.
Hetu v2: A general and scalable deep learning system with hierarchical and heterogeneous single program multiple data annotations
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
fields
cs.DC 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
HARP provides a fine-grained inter-operator parallel planner and a heterogeneity-aware 1F1B scheduler that together improve training throughput by 1.3x-1.6x on mixed GPU clusters compared with current homogeneous-oriented frameworks.
citing papers explorer
-
HexiSeq: Accommodating Long Context Training of LLMs over Heterogeneous Hardware
HexiSeq optimizes sequence and head partitioning across mixed GPUs to improve long-context LLM training throughput by up to 1.72x in simulations.
-
HARP: Orchestrating Automated Parallel Training on Heterogeneous GPU Clusters
HARP provides a fine-grained inter-operator parallel planner and a heterogeneity-aware 1F1B scheduler that together improve training throughput by 1.3x-1.6x on mixed GPU clusters compared with current homogeneous-oriented frameworks.