pith. sign in

arxiv: 2510.03511 · v3 · pith:ZDFRURVOnew · submitted 2025-10-03 · 💻 cs.CV · cs.AI· cs.LG· eess.IV

Platonic Transformers: A Solid Choice For Equivariance

classification 💻 cs.CV cs.AIcs.LGeess.IV
keywords platonicgeometrictransformertransformersattentioncomputercostenables
0
0 comments X
read the original abstract

While widespread, Transformers lack inductive biases for geometric symmetries common in science and computer vision. Existing equivariant methods often sacrifice the efficiency and flexibility that make Transformers so effective through complex, computationally intensive designs. We introduce the Platonic Transformer to resolve this trade-off. By defining attention relative to reference frames from the Platonic solid symmetry groups, our method induces a principled weight-sharing scheme. This enables combined equivariance to continuous translations and Platonic symmetries, while preserving the exact architecture and computational cost of a standard Transformer. Furthermore, we show that this attention is formally equivalent to a dynamic group convolution, which reveals that the model learns adaptive geometric filters and enables a highly scalable, linear-time convolutional variant. Across diverse benchmarks in computer vision (CIFAR-10), 3D point clouds (ScanObjectNN), and molecular property prediction (QM9, OMol25), the Platonic Transformer achieves competitive performance by leveraging these geometric constraints at no additional cost.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Discretizing Group-Convolutional Neural Networks for 3D Geometry in Feature Space

    cs.CV 2026-05 unverdicted novelty 7.0

    Feature-space sampling in GCNNs preserves 3D classification accuracy with coarse discretization, enabling precomputation and faster training of equivariant models.

  2. Kernel-Gradient Drifting Models

    cs.LG 2026-05 unverdicted novelty 7.0

    Kernel-gradient drifting reformulates drifting models via kernel gradients to yield identifiable one-step generation with smoothed score matching and KL descent on Euclidean, Riemannian, and discrete spaces.

  3. Drawback of Enforcing Equivariance and its Compensation via the Lens of Expressive Power

    cs.LG 2025-12 unverdicted novelty 6.0

    Enforcing equivariance reduces expressive power in 2-layer ReLU networks but enlarging the model compensates with proven size bounds and yields lower hypothesis space dimensionality for better generalization.