Functional Attention: From Pairwise Affinities to Functional Correspondences

Daniel Cremers; Guandao Yang; Jiefang Xiao; Maolin Gao; Simon Weber

arxiv: 2605.31559 · v1 · pith:PTX5FNBWnew · submitted 2026-05-29 · 💻 cs.LG

Functional Attention: From Pairwise Affinities to Functional Correspondences

Jiefang Xiao , Maolin Gao , Simon Weber , Guandao Yang , Daniel Cremers This is my paper

Pith reviewed 2026-06-28 23:15 UTC · model grok-4.3

classification 💻 cs.LG

keywords functional attentionoperator learningfunctional mapsresolution invariancetransformerPDE solving3D segmentation

0 comments

The pith

Functional Attention reinterprets attention as structured linear operators between adaptive bases to enable resolution-invariant operator learning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to address limitations in transformer-based operator learning where continuous fields are treated as discrete tokens and attention relies on pairwise softmax affinities that ignore global functional structure. It proposes replacing those affinities with structured linear operators that establish correspondences between adaptive bases, producing a representation that is compact, captures global dependencies explicitly, and does not depend on a particular discretization. A sympathetic reader would care because many scientific and engineering tasks involve learning mappings between infinite-dimensional function spaces, such as PDE solutions or shape analysis, where grid changes or the need for global consistency currently force retraining or loss of accuracy. If the approach holds, models could be trained once and applied across varying resolutions while maintaining performance on tasks like PDE solving and 3D segmentation.

Core claim

Functional Attention reinterprets attention as a functional correspondence between adaptive bases. Inspired by geometric functional maps, the method replaces softmax affinities with structured linear operators, yielding a compact, generalizable, resolution-invariant representation that explicitly captures global dependencies.

What carries the argument

Functional Attention, the replacement of token-wise softmax affinities by structured linear operators that compute functional correspondences between adaptive bases.

If this is right

The method matches state-of-the-art performance on PDE solving, 3D segmentation, and regression tasks.
Performance remains stable under changes in input discretization.
The learned representation is compact and explicitly encodes global functional dependencies rather than local token affinities.
The approach applies across multiple operator-learning domains without requiring task-specific redesign.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Models trained at one resolution could be evaluated directly at another without fine-tuning or architectural changes.
The explicit use of adaptive bases opens a route to hybrid methods that inject classical functional-analysis constraints into learned operators.
The same linear-operator view might be inserted into other attention-based architectures that currently process continuous data as fixed tokens.

Load-bearing premise

That structured linear operators between adaptive bases can replace softmax affinities while preserving performance and delivering resolution invariance plus explicit global dependency capture.

What would settle it

A controlled experiment on a PDE operator-learning benchmark where Functional Attention either underperforms current transformer baselines or shows clear accuracy drop when the input discretization is changed after training.

Figures

Figures reproduced from arXiv: 2605.31559 by Daniel Cremers, Guandao Yang, Jiefang Xiao, Maolin Gao, Simon Weber.

**Figure 1.** Figure 1: Architecture Overview. Top: Input functions are encoded by MLP, processed through N FUNCATTN blocks, and decoded by MLP. Bottom: In each FUNCATTN Module, Q, K, V are transformed to the spectral domain where cross-space attention computes optimal linear mapping C, then inverse-transformed. Purple blocks denote learnable layers. MLP, LN, and FFN stand for Multi-Layer Perceptron, Layer Norm, and Feed-Forward … view at source ↗

**Figure 2.** Figure 2: Few-shot sinusoidal regression. (Top) Predictions at initialization and after training on data with context length = 4 (black dots). Ground truth shown as a gray dotted line. k in FUNCATTN and #slices in Transolver are set to 2. (Bottom) Generalization performance (MSE) across varying context sizes. Our method achieves the lowest MSE and scales most effectively with increasing context size. Proposition 4.5… view at source ↗

**Figure 3.** Figure 3: PDE solving visualization. Ground truth and error maps for Elasticity and Darcy benchmarks. Our method achieves lower error (relative L2, ×100) in both domains. nates of point clouds as input. Tab. 2 summarizes the segmentation accuracy. FUNCATTN achieves the highest accuracy, outperforming both classical point cloud architectures, e.g. PointNet++ and recent operator-based approaches, e.g. DiffusionNet a… view at source ↗

**Figure 4.** Figure 4: Overall design of Transolver (Wu et al., 2024) and FUNCATTN. Background: Intention Intention (Garnelo & Czarnecki, 2023) was proposed as an attention mechanism capable of representing regularized least squares fitting. Given queries Q ∈ R n×d , keys K ∈ R n×d , and values V ∈ R n×d , Intention computes: Intention(Q, K, V) = Q(K⊤K + λId) −1K⊤V (46) Functional Attention Recovers Intention We now show that In… view at source ↗

**Figure 5.** Figure 5: Runtime and memory scaling. Forward-pass time (left) and peak GPU memory (right) plots of sequence length n, with d = 128, k = 64. Softmax attention grows quadratically, whereas FUNCATTN exhibits the predicted linear scaling and outperforms other linear-attention baselines at large n. B.2. Empirical Runtime and Memory Scaling To complement the theoretical analysis, we benchmark the forward-pass runtime and… view at source ↗

**Figure 6.** Figure 6: Condition number of the inverted matrix in Eq. (8) during training on Elasticity, comparing the Tikhonov-stabilized pseudoinverse and the transpose [PITH_FULL_IMAGE:figures/full_fig_p022_6.png] view at source ↗

**Figure 7.** Figure 7: Average condition number of the inverted matrix in Eq. (8) during training on Elasticity, for different initializations of α [PITH_FULL_IMAGE:figures/full_fig_p023_7.png] view at source ↗

**Figure 8.** Figure 8: Visualization of learned basis for different models. in the main text. This reveals where each model struggles, such as near boundaries or in regions with sharp gradients. 23 [PITH_FULL_IMAGE:figures/full_fig_p023_8.png] view at source ↗

**Figure 9.** Figure 9: Prediction Visualizations. (Top) Darcy flow solution fields. (Bottom) Elasticity stress fields on irregular meshes. Each shows ground truth, Transolver, and FUNCATTN with error maps. 24 [PITH_FULL_IMAGE:figures/full_fig_p024_9.png] view at source ↗

**Figure 10.** Figure 10: Prediction Visualizations. (Top) Airfoil velocity fields. (Bottom) Navier-Stokes vorticity fields at t = 20 after rollout. 25 [PITH_FULL_IMAGE:figures/full_fig_p025_10.png] view at source ↗

**Figure 11.** Figure 11: Prediction Visualizations. Plasticity displacement magnitude fields at the final timestep. FuncAttn FuncAttn [PITH_FULL_IMAGE:figures/full_fig_p026_11.png] view at source ↗

**Figure 12.** Figure 12: Prediction Visualizations. Pipe flow velocity fields on irregular meshes. 26 [PITH_FULL_IMAGE:figures/full_fig_p026_12.png] view at source ↗

read the original abstract

Learning mappings between infinite-dimensional function spaces, or operator learning, is essential for many machine learning applications. Although transformer-based operators are popular, they often rely on token-wise attention. These methods treat continuous fields as discrete tokens and usually ignore the global functional structure. We introduce \emph{Functional Attention}, which reinterprets attention as a functional correspondence between adaptive bases. Inspired by geometric functional maps, our method replaces softmax affinities with structured linear operators. This yields a compact, generalizable, resolution-invariant representation that explicitly captures global dependencies. Experiments demonstrate that \emph{Functional Attention} can match state-of-the-art performance in many operator learning tasks, including solving PDEs, 3D segmentation, and regression, while remaining robust to varying discretizations. Project page is available at https://github.com/xjffff/FUNCATTN.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Functional Attention reinterprets attention via geometric functional maps for resolution-invariant operator learning, but the abstract supplies no experimental details to back the SOTA claims.

read the letter

The core idea is to replace token-wise softmax attention with structured linear operators between adaptive bases, drawing from geometric functional maps, so the mechanism stays invariant to discretization and captures global functional structure in operator learning.

This is new in the operator-learning setting. Prior work on functional maps is mostly in geometry processing, and applying it here to attention for PDEs and 3D tasks is a fresh angle. The motivation is solid: standard transformers treat fields as discrete tokens and lose the continuous nature of the problem.

The paper does a reasonable job laying out why global dependencies matter and why resolution invariance would help in scientific ML. If the math on the linear operators is clean, that part could be useful.

The main weakness is that the abstract asserts matching SOTA performance and robustness across PDE solving, 3D segmentation, and regression, yet gives zero methods, metrics, baselines, or error bars. The central assumption—that swapping softmax for these structured operators preserves performance while adding invariance—cannot be checked from the given text. Without seeing the implementation or results, the claim stays untested.

This is for readers already working on operator learning or geometric methods in ML. Someone looking for a new attention variant in that niche might find the idea worth discussing, but only if the full paper has reproducible experiments.

I would send it to peer review so the experiments and any code can be examined properly.

Referee Report

1 major / 0 minor

Summary. The paper introduces Functional Attention for operator learning between infinite-dimensional function spaces. It reinterprets standard token-wise attention as a functional correspondence between adaptive bases, replacing softmax affinities with structured linear operators inspired by geometric functional maps. This is claimed to produce a compact, generalizable, resolution-invariant representation that explicitly captures global dependencies. The abstract states that experiments show the method matches state-of-the-art performance on PDE solving, 3D segmentation, and regression tasks while remaining robust to varying discretizations.

Significance. If the method delivers resolution invariance and global dependency capture while matching SOTA without hidden parameter costs, it could meaningfully advance attention-based operator learning by grounding it in functional analysis and geometric correspondences. The explicit avoidance of discrete tokenization is a potentially valuable direction for continuous fields.

major comments (1)

[Abstract] Abstract: the assertion that 'Experiments demonstrate that Functional Attention can match state-of-the-art performance in many operator learning tasks...' supplies no methods, metrics, baselines, error bars, datasets, or implementation details, so the central experimental claim cannot be evaluated from the manuscript text.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their review and the opportunity to respond. We address the single major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: the assertion that 'Experiments demonstrate that Functional Attention can match state-of-the-art performance in many operator learning tasks...' supplies no methods, metrics, baselines, error bars, datasets, or implementation details, so the central experimental claim cannot be evaluated from the manuscript text.

Authors: We thank the referee for noting this. The abstract is intentionally concise and serves only as a high-level overview; it is not the appropriate location for full experimental protocols, which would exceed typical length limits. The manuscript provides complete details on the experimental setup, including methods, metrics, baselines, error bars, datasets, and implementation, in Section 4 (Experiments) along with the appendix. The abstract claim is therefore grounded in those results. We do not view this as requiring a change to the abstract itself. revision: no

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The provided abstract and description introduce Functional Attention as a reinterpretation of attention inspired by external geometric functional maps, replacing softmax with structured linear operators between adaptive bases. No equations, derivations, or self-citations are shown that reduce any central claim to fitted inputs or prior self-referential definitions by construction. The experimental claims are presented as validation rather than tautological predictions. This matches the default expectation of a self-contained method description without load-bearing circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities beyond the high-level proposal; the core idea rests on an unelaborated domain assumption about functional correspondences.

axioms (1)

domain assumption Structured linear operators between adaptive bases can replace softmax affinities while capturing global functional dependencies and enabling resolution invariance.
This premise is invoked to justify the replacement of standard attention and is not supported by details in the abstract.

pith-pipeline@v0.9.1-grok · 5678 in / 1128 out tokens · 24172 ms · 2026-06-28T23:15:06.205739+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

23 extracted references · 14 canonical work pages · 7 internal anchors

[1]

Point Convolutional Neural Networks by Extension Operators

Atzmon, M., Maron, H., and Lipman, Y . Point convolutional neural networks by extension operators.arXiv preprint arXiv:1803.10091,

work page internal anchor Pith review Pith/arXiv arXiv
[2]

Di- jiang: Efficient large language models through compact kernelization.arXiv preprint arXiv:2403.19928,

Chen, H., Liu, Z., Wang, X., Tian, Y ., and Wang, Y . Di- jiang: Efficient large language models through compact kernelization.arXiv preprint arXiv:2403.19928,

work page arXiv
[3]

Rethinking Attention with Performers

Choromanski, K., Likhosherstov, V ., Dohan, D., Song, X., Gane, A., Sarlos, T., Hawkins, P., Davis, J., Mohiuddin, A., Kaiser, L., et al. Rethinking attention with performers. arXiv preprint arXiv:2009.14794,

work page internal anchor Pith review Pith/arXiv arXiv 2009
[4]

Bert: Pre-training of deep bidirectional transformers for lan- guage understanding

Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. Bert: Pre-training of deep bidirectional transformers for lan- guage understanding. InProceedings of the 2019 Confer- ence of the North American Chapter of the Association for Computational Linguistics: Human Language Tech- nologies, pp. 4171–4186,

2019
[5]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Dosovitskiy, A. An image is worth 16x16 words: Trans- formers for image recognition at scale.arXiv preprint arXiv:2010.11929,

work page internal anchor Pith review Pith/arXiv arXiv 2010
[6]

On the Properties of the Softmax Function with Application in Game Theory and Reinforcement Learning

Gao, B. and Pavel, L. On the properties of the softmax func- tion with application in game theory and reinforcement learning.arXiv preprint arXiv:1704.00805,

work page internal anchor Pith review Pith/arXiv arXiv
[7]

and Ji, S

Gao, H. and Ji, S. Graph U-Nets. InInternational Con- ference on Machine Learning, pp. 2083–2092. PMLR,

2083
[8]

and Czarnecki, W

Garnelo, M. and Czarnecki, W. M. Exploring the space of key-value-query models with intention.arXiv preprint arXiv:2305.10203,

work page arXiv
[9]

Neural Operator: Graph Kernel Network for Partial Differential Equations

Li, Z., Kovachki, N., Azizzadenesheli, K., Liu, B., Bhat- tacharya, K., Stuart, A., and Anandkumar, A. Neural operator: Graph kernel network for partial differential equations.arXiv preprint arXiv:2003.03485,

work page internal anchor Pith review Pith/arXiv arXiv 2003
[10]

Z., Liu, B., and Anandkumar, A

Li, Z., Huang, D. Z., Liu, B., and Anandkumar, A. Fourier neural operator with learned deformations for PDEs on general geometries.Journal of Machine Learning Re- search, 24(388):1–26, 2023a. Li, Z., Kovachki, N., Choy, C., Li, B., Kossaifi, J., Otta, S., Nabian, M. A., Stadler, M., Hundt, C., Azizzade- nesheli, K., et al. Geometry-informed neural operato...

work page arXiv
[11]

Transolver++: An accurate neural solver for PDEs on million-scale geome- tries

Luo, H., Wu, H., Zhou, H., Xing, L., Di, Y ., Wang, J., and Long, M. Transolver++: An accurate neural solver for PDEs on million-scale geometries.arXiv preprint arXiv:2502.02414,

work page arXiv
[12]

and Chakraborty, S

11 Functional Attention: From Pairwise Affinities to Functional Correspondences Tripura, T. and Chakraborty, S. Wavelet neural operator: a neural operator for parametric partial differential equa- tions.arXiv preprint arXiv:2205.02191,

work page arXiv
[13]

Linformer: Self-Attention with Linear Complexity

Wang, S., Li, B. Z., Khabsa, M., Fang, H., and Ma, H. Linformer: Self-attention with linear complexity.arXiv preprint arXiv:2006.04768,

work page internal anchor Pith review Pith/arXiv arXiv 2006
[14]

Transolver: A Fast Transformer Solver for PDEs on General Geometries

Wu, H., Luo, H., Wang, H., Wang, J., and Long, M. Tran- solver: A fast transformer solver for PDEs on general geometries.arXiv preprint arXiv:2402.02366,

work page internal anchor Pith review Pith/arXiv arXiv
[15]

S., Abillama, P., Lee, C., and Balzano, L

Yaras, C., Xu, A. S., Abillama, P., Lee, C., and Balzano, L. Monarchattention: Zero-shot conversion to fast, hardware-aware structured attention.arXiv preprint arXiv:2505.18698,

work page arXiv
[16]

The hedgehog & the porcupine: Expressive linear attentions with softmax mimicry.arXiv preprint arXiv:2402.04347,

Zhang, M., Bhatia, K., Kumbong, H., and R ´e, C. The hedgehog & the porcupine: Expressive linear attentions with softmax mimicry.arXiv preprint arXiv:2402.04347,

work page arXiv
[17]

(2024)).Suppose thatΩis a countable domain, the reduced domainΩ spec is isomorphic toΩ

Lemma A.1(Wu et al. (2024)).Suppose thatΩis a countable domain, the reduced domainΩ spec is isomorphic toΩ. Lemma A.2.The operator QK⊤(KK⊤ +λI n)−1 V can be interpreted as a Monte-Carlo discretization of a regularized integral operator. Proof. Given input function u: Ω→R C, define the key Gram kernel h(ξ, ξ′) :=k(ξ) ⊤k(ξ′) where k(ξ) =W ku(ξ), and the ass...

2024
[18]

was proposed as an attention mechanism capable of representing regularized least squares fitting. Given queries Q∈R n×d, keys K∈R n×d, and values V∈R n×d, Intention computes: Intention(Q,K,V) =Q(K ⊤K+λI d)−1K⊤V(46) Functional Attention Recovers IntentionWe now show that Intention is a special case of Functional Attention when we choose anyorthonormal basi...

1997
[19]

The amplitude α and phase γ are sampled uniformly from [0.1,5] and [0, π], respectively

where each task corresponds to a sinusoidal function f(x) =αsin(x−γ) defined on x∈[−6,6] . The amplitude α and phase γ are sampled uniformly from [0.1,5] and [0, π], respectively. For each task, we observe a support set of K randomly sampled input-output pairs. The goal is to learn a predictor that generalizes to arbitrary query locations given only the s...

2023
[20]

PDE Benchmarks We benchmark our methods on eight popular PDEs benchmarks across diverse geometries and physical scenarios: Table 9.Summary of benchmark datasets

– – – Heads 4 8 8 8 Learning Rate 10−3 10−4 3×10 −4 10−4 C.2. PDE Benchmarks We benchmark our methods on eight popular PDEs benchmarks across diverse geometries and physical scenarios: Table 9.Summary of benchmark datasets. Benchmark Input Spatial Resolution Input length Output Train/Test Elasticity Domain geometry Point cloud 972 Displacementu1000/200 Ai...

2021
[21]

The dataset features airfoils from the NACA 4- and 5-digit series, with each case discretized into approximately 32,000 mesh points

contains high-fidelity simulation data for Reynolds-Averaged Navier-Stokes (RANS) equations, designed to assist airfoil design. The dataset features airfoils from the NACA 4- and 5-digit series, with each case discretized into approximately 32,000 mesh points. The simulation records air velocity, pressure, and viscosity in the surrounding space, as well a...

2024
[22]

Training details.We use a consistent architecture across all benchmarks with 8 transformer layers and 8 attention heads to match previous work

between predicted and ground truth coefficients across test samples, which measures how well the model preserves the ranking of designs—a key property for engineering optimization. Training details.We use a consistent architecture across all benchmarks with 8 transformer layers and 8 attention heads to match previous work. The hidden channel dimension is ...

2023
[23]

Lg denotes spatial gradient regularization (Xiao et al., 2024)

without extra tuning. Lg denotes spatial gradient regularization (Xiao et al., 2024). Lv and Ls denote volume and surface losses respectively. Training Configuration Model Configuration Benchmark Loss Epochs LR Optim Batch Layers Heads Channels Modes Elasticity Rel.L 2 500 10−3 AdamW 1 8 8 128 64 Plasticity 8 8 8 128 64 Airfoil 4 8 8 128 64 Pipe 4 8 8 128...

2024

[1] [1]

Point Convolutional Neural Networks by Extension Operators

Atzmon, M., Maron, H., and Lipman, Y . Point convolutional neural networks by extension operators.arXiv preprint arXiv:1803.10091,

work page internal anchor Pith review Pith/arXiv arXiv

[2] [2]

Di- jiang: Efficient large language models through compact kernelization.arXiv preprint arXiv:2403.19928,

Chen, H., Liu, Z., Wang, X., Tian, Y ., and Wang, Y . Di- jiang: Efficient large language models through compact kernelization.arXiv preprint arXiv:2403.19928,

work page arXiv

[3] [3]

Rethinking Attention with Performers

Choromanski, K., Likhosherstov, V ., Dohan, D., Song, X., Gane, A., Sarlos, T., Hawkins, P., Davis, J., Mohiuddin, A., Kaiser, L., et al. Rethinking attention with performers. arXiv preprint arXiv:2009.14794,

work page internal anchor Pith review Pith/arXiv arXiv 2009

[4] [4]

Bert: Pre-training of deep bidirectional transformers for lan- guage understanding

Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. Bert: Pre-training of deep bidirectional transformers for lan- guage understanding. InProceedings of the 2019 Confer- ence of the North American Chapter of the Association for Computational Linguistics: Human Language Tech- nologies, pp. 4171–4186,

2019

[5] [5]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Dosovitskiy, A. An image is worth 16x16 words: Trans- formers for image recognition at scale.arXiv preprint arXiv:2010.11929,

work page internal anchor Pith review Pith/arXiv arXiv 2010

[6] [6]

On the Properties of the Softmax Function with Application in Game Theory and Reinforcement Learning

Gao, B. and Pavel, L. On the properties of the softmax func- tion with application in game theory and reinforcement learning.arXiv preprint arXiv:1704.00805,

work page internal anchor Pith review Pith/arXiv arXiv

[7] [7]

and Ji, S

Gao, H. and Ji, S. Graph U-Nets. InInternational Con- ference on Machine Learning, pp. 2083–2092. PMLR,

2083

[8] [8]

and Czarnecki, W

Garnelo, M. and Czarnecki, W. M. Exploring the space of key-value-query models with intention.arXiv preprint arXiv:2305.10203,

work page arXiv

[9] [9]

Neural Operator: Graph Kernel Network for Partial Differential Equations

Li, Z., Kovachki, N., Azizzadenesheli, K., Liu, B., Bhat- tacharya, K., Stuart, A., and Anandkumar, A. Neural operator: Graph kernel network for partial differential equations.arXiv preprint arXiv:2003.03485,

work page internal anchor Pith review Pith/arXiv arXiv 2003

[10] [10]

Z., Liu, B., and Anandkumar, A

Li, Z., Huang, D. Z., Liu, B., and Anandkumar, A. Fourier neural operator with learned deformations for PDEs on general geometries.Journal of Machine Learning Re- search, 24(388):1–26, 2023a. Li, Z., Kovachki, N., Choy, C., Li, B., Kossaifi, J., Otta, S., Nabian, M. A., Stadler, M., Hundt, C., Azizzade- nesheli, K., et al. Geometry-informed neural operato...

work page arXiv

[11] [11]

Transolver++: An accurate neural solver for PDEs on million-scale geome- tries

Luo, H., Wu, H., Zhou, H., Xing, L., Di, Y ., Wang, J., and Long, M. Transolver++: An accurate neural solver for PDEs on million-scale geometries.arXiv preprint arXiv:2502.02414,

work page arXiv

[12] [12]

and Chakraborty, S

11 Functional Attention: From Pairwise Affinities to Functional Correspondences Tripura, T. and Chakraborty, S. Wavelet neural operator: a neural operator for parametric partial differential equa- tions.arXiv preprint arXiv:2205.02191,

work page arXiv

[13] [13]

Linformer: Self-Attention with Linear Complexity

Wang, S., Li, B. Z., Khabsa, M., Fang, H., and Ma, H. Linformer: Self-attention with linear complexity.arXiv preprint arXiv:2006.04768,

work page internal anchor Pith review Pith/arXiv arXiv 2006

[14] [14]

Transolver: A Fast Transformer Solver for PDEs on General Geometries

Wu, H., Luo, H., Wang, H., Wang, J., and Long, M. Tran- solver: A fast transformer solver for PDEs on general geometries.arXiv preprint arXiv:2402.02366,

work page internal anchor Pith review Pith/arXiv arXiv

[15] [15]

S., Abillama, P., Lee, C., and Balzano, L

Yaras, C., Xu, A. S., Abillama, P., Lee, C., and Balzano, L. Monarchattention: Zero-shot conversion to fast, hardware-aware structured attention.arXiv preprint arXiv:2505.18698,

work page arXiv

[16] [16]

The hedgehog & the porcupine: Expressive linear attentions with softmax mimicry.arXiv preprint arXiv:2402.04347,

Zhang, M., Bhatia, K., Kumbong, H., and R ´e, C. The hedgehog & the porcupine: Expressive linear attentions with softmax mimicry.arXiv preprint arXiv:2402.04347,

work page arXiv

[17] [17]

(2024)).Suppose thatΩis a countable domain, the reduced domainΩ spec is isomorphic toΩ

Lemma A.1(Wu et al. (2024)).Suppose thatΩis a countable domain, the reduced domainΩ spec is isomorphic toΩ. Lemma A.2.The operator QK⊤(KK⊤ +λI n)−1 V can be interpreted as a Monte-Carlo discretization of a regularized integral operator. Proof. Given input function u: Ω→R C, define the key Gram kernel h(ξ, ξ′) :=k(ξ) ⊤k(ξ′) where k(ξ) =W ku(ξ), and the ass...

2024

[18] [18]

was proposed as an attention mechanism capable of representing regularized least squares fitting. Given queries Q∈R n×d, keys K∈R n×d, and values V∈R n×d, Intention computes: Intention(Q,K,V) =Q(K ⊤K+λI d)−1K⊤V(46) Functional Attention Recovers IntentionWe now show that Intention is a special case of Functional Attention when we choose anyorthonormal basi...

1997

[19] [19]

The amplitude α and phase γ are sampled uniformly from [0.1,5] and [0, π], respectively

where each task corresponds to a sinusoidal function f(x) =αsin(x−γ) defined on x∈[−6,6] . The amplitude α and phase γ are sampled uniformly from [0.1,5] and [0, π], respectively. For each task, we observe a support set of K randomly sampled input-output pairs. The goal is to learn a predictor that generalizes to arbitrary query locations given only the s...

2023

[20] [20]

PDE Benchmarks We benchmark our methods on eight popular PDEs benchmarks across diverse geometries and physical scenarios: Table 9.Summary of benchmark datasets

– – – Heads 4 8 8 8 Learning Rate 10−3 10−4 3×10 −4 10−4 C.2. PDE Benchmarks We benchmark our methods on eight popular PDEs benchmarks across diverse geometries and physical scenarios: Table 9.Summary of benchmark datasets. Benchmark Input Spatial Resolution Input length Output Train/Test Elasticity Domain geometry Point cloud 972 Displacementu1000/200 Ai...

2021

[21] [21]

The dataset features airfoils from the NACA 4- and 5-digit series, with each case discretized into approximately 32,000 mesh points

contains high-fidelity simulation data for Reynolds-Averaged Navier-Stokes (RANS) equations, designed to assist airfoil design. The dataset features airfoils from the NACA 4- and 5-digit series, with each case discretized into approximately 32,000 mesh points. The simulation records air velocity, pressure, and viscosity in the surrounding space, as well a...

2024

[22] [22]

Training details.We use a consistent architecture across all benchmarks with 8 transformer layers and 8 attention heads to match previous work

between predicted and ground truth coefficients across test samples, which measures how well the model preserves the ranking of designs—a key property for engineering optimization. Training details.We use a consistent architecture across all benchmarks with 8 transformer layers and 8 attention heads to match previous work. The hidden channel dimension is ...

2023

[23] [23]

Lg denotes spatial gradient regularization (Xiao et al., 2024)

without extra tuning. Lg denotes spatial gradient regularization (Xiao et al., 2024). Lv and Ls denote volume and surface losses respectively. Training Configuration Model Configuration Benchmark Loss Epochs LR Optim Batch Layers Heads Channels Modes Elasticity Rel.L 2 500 10−3 AdamW 1 8 8 128 64 Plasticity 8 8 8 128 64 Airfoil 4 8 8 128 64 Pipe 4 8 8 128...

2024