Flashattention-2: Faster attention with better parallelism and work partitioning

Dao, T · 2024

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Training-Inference Consistent Segmented Execution for Long-Context LLMs

cs.CL · 2026-05-12 · conditional · novelty 6.0

A training-inference consistent segmented execution framework for long-context LLMs matches full-context performance with substantially lower peak memory at very long lengths.

citing papers explorer

Showing 1 of 1 citing paper.

Training-Inference Consistent Segmented Execution for Long-Context LLMs cs.CL · 2026-05-12 · conditional · none · ref 42
A training-inference consistent segmented execution framework for long-context LLMs matches full-context performance with substantially lower peak memory at very long lengths.

Flashattention-2: Faster attention with better parallelism and work partitioning

fields

years

verdicts

representative citing papers

citing papers explorer