A two-level overlapping Schwarz domain decomposition constructs a hierarchical attention operator that trains faster and approximates the inverse of a discretized 1D diffusion operator more accurately than global low-rank attention while using fewer parameters.
Parallel scalability of three-level FROSch preconditioners to 220000 cores using the Theta supercomputer.SIAM Journal on Scientific Computing, 44(4):C253–C278, 2022
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Hierarchical Attention via Domain Decomposition
A two-level overlapping Schwarz domain decomposition constructs a hierarchical attention operator that trains faster and approximates the inverse of a discretized 1D diffusion operator more accurately than global low-rank attention while using fewer parameters.