Recognition: no theorem link
Embedding Dimension Lower Bounds for Universality of Deep Sets and Janossy Pooling
Pith reviewed 2026-05-12 01:53 UTC · model grok-4.3
The pith
Lower bounds prove that Deep Sets need embedding dimension scaling with the number of points to achieve universality for d greater than 1.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Using a novel technique, the authors prove new lower bounds on the embedding dimension needed to guarantee universality of Deep Sets and Janossy pooling. For Deep Sets, this gives the correct minimal dimension up to a constant factor for all d > 1. For k-ary Janossy pooling, the results supply the first non-trivial lower bound on the required embedding dimension when k > 1.
What carries the argument
Janossy pooling with a single fixed embedding dimension, which maps variable-sized point sets to an invariant representation before a final continuous function is applied.
If this is right
- Deep Sets achieve universality only when the embedding dimension is at least linear in the set size for d > 1.
- k-ary Janossy pooling requires embedding dimensions that grow with both set size and arity k to reach universality.
- Architectures with embedding dimensions below these bounds cannot approximate all continuous invariant functions on point clouds.
- The bounds clarify the minimal width needed before the final network layer in these symmetric models.
Where Pith is reading between the lines
- These scaling requirements suggest that practical implementations for large point clouds may need wider embeddings than commonly used, potentially increasing memory costs.
- The lower-bound technique could be adapted to quantify embedding needs for other invariance groups beyond permutations.
- If the bounds are tight, then combining Janossy pooling with deeper final networks alone cannot compensate for insufficient embedding dimension.
Load-bearing premise
Universality is defined as uniform approximation of continuous functions on compact sets, and the network uses exactly the stated Janossy pooling form with one fixed embedding dimension.
What would settle it
A concrete construction of a universal Janossy pooling network with embedding dimension strictly below the proven lower bound for some fixed n, d, and k, or a counterexample showing that the lower bound fails to hold under the uniform approximation definition.
Figures
read the original abstract
In many practical applications it is important to build symmetries into neural network architectures. Consider the important case of permutation symmetry on point clouds consisting of $n$ points in $d$ dimensions. In this case the network learns a function on a set of $n$ points in $\mathbb{R}^d$, and a natural paradigm for constructing invariant networks is Janossy pooling, which generalizes the popular Deep Sets architecture. We study the universality of this approach, in particular the important question of how large the embedding dimension must be to guarantee universality of this architecture. Specifically, using a novel technique, we prove new lower bounds on the required size of this embedding dimension. For Deep Sets, this gives the correct minimal dimension up to a constant factor for all $d > 1$. For $k$-ary Janossy pooling, we prove the first non-trivial lower bound on the required embedding dimension when $k > 1$.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper establishes lower bounds on the embedding dimension required to guarantee universality (uniform approximation of continuous functions on compact sets) for Deep Sets and k-ary Janossy pooling architectures that enforce permutation invariance on n-point clouds in R^d. Using a novel technique, it shows that the Deep Sets lower bound matches the known minimal dimension up to a constant factor for all d > 1, and derives the first non-trivial lower bound for the embedding dimension when k > 1.
Significance. If the stated bounds hold, the work provides important theoretical guidance on minimal embedding sizes for universal symmetric networks, closing the gap with known upper bounds for Deep Sets and initiating non-trivial analysis for higher-order Janossy pooling. The novel proof technique is a clear strength, as are the falsifiable predictions and the absence of free parameters or ad-hoc assumptions in the central claims.
minor comments (2)
- The abstract and introduction should explicitly compare the new lower bounds to the best known upper bounds (e.g., via a table or inline constants) to make the 'up to a constant factor' claim immediately verifiable.
- Notation for the embedding dimension m, the Janossy pooling operator, and the function class being approximated should be introduced once in §2 and used consistently thereafter to improve readability.
Simulated Author's Rebuttal
We thank the referee for their positive assessment of our work and for recommending minor revision. The referee's summary accurately captures the main contributions: new lower bounds on embedding dimension for universality of Deep Sets (matching known upper bounds up to constants for d > 1) and the first non-trivial bounds for k-ary Janossy pooling when k > 1. We appreciate the recognition of the novel proof technique and the absence of ad-hoc assumptions.
Circularity Check
No significant circularity in derivation chain
full rationale
The paper derives lower bounds on embedding dimension for Deep Sets and k-ary Janossy pooling universality via a novel functional-analysis technique. These bounds are obtained from the standard definition of universality (uniform approximation of continuous functions on compact sets) and the exact architecture form, without any reduction to fitted parameters, self-definitional equivalences, or load-bearing self-citations. The central claims rest on independent mathematical arguments that do not collapse to the inputs by construction, making the derivation self-contained.
Axiom & Free-Parameter Ledger
axioms (2)
- standard math Continuous functions on compact sets can be uniformly approximated by neural networks under suitable conditions
- domain assumption Permutation invariance is enforced exactly by the Janossy pooling construction
Reference graph
Works this paper leans on
-
[1]
Nadav Dym and Steven J Gortler. Low-dimensional invariant embeddings for universal geomet- ric learning.Foundations of Computational Mathematics, 25(2):375–415, 2025
work page 2025
-
[2]
Sur le théorème de superposition de kolmogorov.Journal of Approximation Theory, 13(3):229–234, 1975
Jean-Pierre Kahane. Sur le théorème de superposition de kolmogorov.Journal of Approximation Theory, 13(3):229–234, 1975
work page 1975
-
[3]
Andrei Nikolaevich Kolmogorov. On the representations of continuous functions of many variables by superposition of continuous functions of one variable and addition. InDokl. Akad. Nauk USSR, volume 114, pages 953–956, 1957
work page 1957
-
[4]
Michael Levin. Dimension and superposition of continuous functions.Israel Journal of Mathematics, 70(2):205–218, 1990
work page 1990
-
[5]
KAN: Kolmogorov-Arnold Networks
Ziming Liu, Yixuan Wang, Sachin Vaidya, Fabian Ruehle, James Halverson, Marin Soljaˇci´c, Thomas Y Hou, and Max Tegmark. Kan: Kolmogorov-arnold networks.arXiv preprint arXiv:2404.19756, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[6]
On the universality of invariant networks
Haggai Maron, Ethan Fetaya, Nimrod Segol, and Yaron Lipman. On the universality of invariant networks. InInternational conference on machine learning, pages 4363–4371. PMLR, 2019
work page 2019
-
[7]
Jiˇrí Matoušek.Using the Borsuk–Ulam theorem: lectures on topological methods in combina- torics and geometry. Springer, 2003
work page 2003
-
[8]
Janossy pooling: Learning deep permutation- invariant functions for variable-size inputs
R Murphy, B Srinivasan, V Rao, and B Riberio. Janossy pooling: Learning deep permutation- invariant functions for variable-size inputs. InInternational Conference on Learning Represen- tations (ICLR 2019), 2019
work page 2019
-
[9]
Pointnet: Deep learning on point sets for 3d classification and segmentation
Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. Pointnet: Deep learning on point sets for 3d classification and segmentation. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 652–660, 2017
work page 2017
-
[10]
Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J Guibas. Pointnet++: Deep hierarchical feature learning on point sets in a metric space.Advances in neural information processing systems, 30, 2017
work page 2017
-
[11]
Y Sternfeld. Dimension, superposition of functions and separation of points, in compact metric spaces.Israel Journal of Mathematics, 50(1):13–53, 1985
work page 1985
-
[12]
Attention is all you need.Advances in neural information processing systems, 30, 2017
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017
work page 2017
-
[13]
Edward Wagstaff, Fabian B Fuchs, Martin Engelcke, Michael A Osborne, and Ingmar Posner. Universal approximation of functions on sets.Journal of Machine Learning Research, 23(151):1– 56, 2022
work page 2022
-
[14]
Deep sets.Advances in neural information processing systems, 30, 2017
Manzil Zaheer, Satwik Kottur, Siamak Ravanbakhsh, Barnabas Poczos, Russ R Salakhutdinov, and Alexander J Smola. Deep sets.Advances in neural information processing systems, 30, 2017. 10 A Proof of Proposition 2 (ii) Proof. Let r= min{p, nd} . Choose pairwise disjoint closed cubes C1, . . . , Cn ⊂(0,1) d and affine homeomorphismsT j :K→C j. Then L={(T 1t1,...
work page 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.