Platonic Transformers: A Solid Choice For Equivariance
read the original abstract
While widespread, Transformers lack inductive biases for geometric symmetries common in science and computer vision. Existing equivariant methods often sacrifice the efficiency and flexibility that make Transformers so effective through complex, computationally intensive designs. We introduce the Platonic Transformer to resolve this trade-off. By defining attention relative to reference frames from the Platonic solid symmetry groups, our method induces a principled weight-sharing scheme. This enables combined equivariance to continuous translations and Platonic symmetries, while preserving the exact architecture and computational cost of a standard Transformer. Furthermore, we show that this attention is formally equivalent to a dynamic group convolution, which reveals that the model learns adaptive geometric filters and enables a highly scalable, linear-time convolutional variant. Across diverse benchmarks in computer vision (CIFAR-10), 3D point clouds (ScanObjectNN), and molecular property prediction (QM9, OMol25), the Platonic Transformer achieves competitive performance by leveraging these geometric constraints at no additional cost.
This paper has not been read by Pith yet.
Forward citations
Cited by 3 Pith papers
-
Discretizing Group-Convolutional Neural Networks for 3D Geometry in Feature Space
Feature-space sampling in GCNNs preserves 3D classification accuracy with coarse discretization, enabling precomputation and faster training of equivariant models.
-
Kernel-Gradient Drifting Models
Kernel-gradient drifting reformulates drifting models via kernel gradients to yield identifiable one-step generation with smoothed score matching and KL descent on Euclidean, Riemannian, and discrete spaces.
-
Drawback of Enforcing Equivariance and its Compensation via the Lens of Expressive Power
Enforcing equivariance reduces expressive power in 2-layer ReLU networks but enlarging the model compensates with proven size bounds and yields lower hypothesis space dimensionality for better generalization.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.