arxiv: 2605.08377 · v1 · submitted 2026-05-08 · 💻 cs.LG · stat.ML

Recognition: no theorem link

Embedding Dimension Lower Bounds for Universality of Deep Sets and Janossy Pooling

Aditya Nambiar, Ali Syed, Jonathan W. Siegel

Authors on Pith no claims yet

Pith reviewed 2026-05-12 01:53 UTC · model grok-4.3

classification 💻 cs.LG stat.ML

keywords deep setsjanossy poolingpermutation invarianceuniversal approximationembedding dimensionpoint cloudsneural networkssymmetric architectures

0 comments

The pith

Lower bounds prove that Deep Sets need embedding dimension scaling with the number of points to achieve universality for d greater than 1.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes lower bounds on the embedding dimension required for Janossy pooling networks to universally approximate continuous permutation-invariant functions on point clouds of n points in R^d. For the special case of Deep Sets, these bounds match the known minimal dimension up to a constant factor when d exceeds 1. For general k-ary Janossy pooling with k larger than 1, the work gives the first nontrivial lower bounds on this dimension. A sympathetic reader would care because these results quantify the capacity limits of symmetry-preserving architectures commonly used for sets and point clouds, showing that small fixed embeddings cannot suffice for full expressivity.

Core claim

Using a novel technique, the authors prove new lower bounds on the embedding dimension needed to guarantee universality of Deep Sets and Janossy pooling. For Deep Sets, this gives the correct minimal dimension up to a constant factor for all d > 1. For k-ary Janossy pooling, the results supply the first non-trivial lower bound on the required embedding dimension when k > 1.

What carries the argument

Janossy pooling with a single fixed embedding dimension, which maps variable-sized point sets to an invariant representation before a final continuous function is applied.

If this is right

Deep Sets achieve universality only when the embedding dimension is at least linear in the set size for d > 1.
k-ary Janossy pooling requires embedding dimensions that grow with both set size and arity k to reach universality.
Architectures with embedding dimensions below these bounds cannot approximate all continuous invariant functions on point clouds.
The bounds clarify the minimal width needed before the final network layer in these symmetric models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

These scaling requirements suggest that practical implementations for large point clouds may need wider embeddings than commonly used, potentially increasing memory costs.
The lower-bound technique could be adapted to quantify embedding needs for other invariance groups beyond permutations.
If the bounds are tight, then combining Janossy pooling with deeper final networks alone cannot compensate for insufficient embedding dimension.

Load-bearing premise

Universality is defined as uniform approximation of continuous functions on compact sets, and the network uses exactly the stated Janossy pooling form with one fixed embedding dimension.

What would settle it

A concrete construction of a universal Janossy pooling network with embedding dimension strictly below the proven lower bound for some fixed n, d, and k, or a counterexample showing that the lower bound fails to hold under the uniform approximation definition.

Figures

Figures reproduced from arXiv: 2605.08377 by Aditya Nambiar, Ali Syed, Jonathan W. Siegel.

**Figure 1.** Figure 1: For k = 2, the black points represent the base grid G2 , while the red points represent the axis subset A. A non-axis grid point (y1, y2) is the top vertex of the rectangle whose other vertices (0, 0), (y1, 0), and (0, y2) all lie in A. Lemma 1 (Finite-difference rigidity). Let Φ : Kn → RM be an indexed k-Janossy latent map of the form (5). Let ai , yi ∈ R d for i = 1, . . . , k, and suppose ai + ηiyi ∈ K … view at source ↗

**Figure 2.** Figure 2: Case d = 2, n = 2, k = 1. The tail block has dimension d(n − k) = 2, so the perturbation directions form S 1 . The circle is covered by antipodal-free regions B1, B2, B3. This is exactly the k-fold alternating difference in (8), up to an overall sign. Since multiplying by an overall sign does not affect dependence on the tail variables, the k-fold alternating difference is independent of xk+1, . . . , xn. … view at source ↗

read the original abstract

In many practical applications it is important to build symmetries into neural network architectures. Consider the important case of permutation symmetry on point clouds consisting of $n$ points in $d$ dimensions. In this case the network learns a function on a set of $n$ points in $\mathbb{R}^d$, and a natural paradigm for constructing invariant networks is Janossy pooling, which generalizes the popular Deep Sets architecture. We study the universality of this approach, in particular the important question of how large the embedding dimension must be to guarantee universality of this architecture. Specifically, using a novel technique, we prove new lower bounds on the required size of this embedding dimension. For Deep Sets, this gives the correct minimal dimension up to a constant factor for all $d > 1$. For $k$-ary Janossy pooling, we prove the first non-trivial lower bound on the required embedding dimension when $k > 1$.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives the first non-trivial lower bounds on embedding dimension for Deep Sets (near-tight up to constants for d>1) and for k-ary Janossy pooling when k>1.

read the letter

The main point is that this work pins down how large the embedding dimension needs to be for these permutation-invariant architectures to be universal. For standard Deep Sets it matches the known upper bounds up to a constant factor once d exceeds 1, and for the k-ary version it supplies the first lower bound that is not trivial when k is bigger than 1. That directly limits how small you can make the hidden layers if you want to approximate arbitrary continuous invariant functions on compact sets of point clouds.

Referee Report

0 major / 2 minor

Summary. The paper establishes lower bounds on the embedding dimension required to guarantee universality (uniform approximation of continuous functions on compact sets) for Deep Sets and k-ary Janossy pooling architectures that enforce permutation invariance on n-point clouds in R^d. Using a novel technique, it shows that the Deep Sets lower bound matches the known minimal dimension up to a constant factor for all d > 1, and derives the first non-trivial lower bound for the embedding dimension when k > 1.

Significance. If the stated bounds hold, the work provides important theoretical guidance on minimal embedding sizes for universal symmetric networks, closing the gap with known upper bounds for Deep Sets and initiating non-trivial analysis for higher-order Janossy pooling. The novel proof technique is a clear strength, as are the falsifiable predictions and the absence of free parameters or ad-hoc assumptions in the central claims.

minor comments (2)

The abstract and introduction should explicitly compare the new lower bounds to the best known upper bounds (e.g., via a table or inline constants) to make the 'up to a constant factor' claim immediately verifiable.
Notation for the embedding dimension m, the Janossy pooling operator, and the function class being approximated should be introduced once in §2 and used consistently thereafter to improve readability.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive assessment of our work and for recommending minor revision. The referee's summary accurately captures the main contributions: new lower bounds on embedding dimension for universality of Deep Sets (matching known upper bounds up to constants for d > 1) and the first non-trivial bounds for k-ary Janossy pooling when k > 1. We appreciate the recognition of the novel proof technique and the absence of ad-hoc assumptions.

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper derives lower bounds on embedding dimension for Deep Sets and k-ary Janossy pooling universality via a novel functional-analysis technique. These bounds are obtained from the standard definition of universality (uniform approximation of continuous functions on compact sets) and the exact architecture form, without any reduction to fitted parameters, self-definitional equivalences, or load-bearing self-citations. The central claims rest on independent mathematical arguments that do not collapse to the inputs by construction, making the derivation self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The work rests on standard mathematical background for continuous function approximation and permutation invariance; no free parameters, invented entities, or ad-hoc axioms are indicated in the abstract.

axioms (2)

standard math Continuous functions on compact sets can be uniformly approximated by neural networks under suitable conditions
Implicit in the definition of universality used throughout the abstract.
domain assumption Permutation invariance is enforced exactly by the Janossy pooling construction
Central to the architecture whose universality is being bounded.

pith-pipeline@v0.9.0 · 5463 in / 1274 out tokens · 45465 ms · 2026-05-12T01:53:08.749168+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

14 extracted references · 14 canonical work pages · 1 internal anchor

[1]

Low-dimensional invariant embeddings for universal geomet- ric learning.Foundations of Computational Mathematics, 25(2):375–415, 2025

Nadav Dym and Steven J Gortler. Low-dimensional invariant embeddings for universal geomet- ric learning.Foundations of Computational Mathematics, 25(2):375–415, 2025

work page 2025
[2]

Sur le théorème de superposition de kolmogorov.Journal of Approximation Theory, 13(3):229–234, 1975

Jean-Pierre Kahane. Sur le théorème de superposition de kolmogorov.Journal of Approximation Theory, 13(3):229–234, 1975

work page 1975
[3]

On the representations of continuous functions of many variables by superposition of continuous functions of one variable and addition

Andrei Nikolaevich Kolmogorov. On the representations of continuous functions of many variables by superposition of continuous functions of one variable and addition. InDokl. Akad. Nauk USSR, volume 114, pages 953–956, 1957

work page 1957
[4]

Dimension and superposition of continuous functions.Israel Journal of Mathematics, 70(2):205–218, 1990

Michael Levin. Dimension and superposition of continuous functions.Israel Journal of Mathematics, 70(2):205–218, 1990

work page 1990
[5]

KAN: Kolmogorov-Arnold Networks

Ziming Liu, Yixuan Wang, Sachin Vaidya, Fabian Ruehle, James Halverson, Marin Soljaˇci´c, Thomas Y Hou, and Max Tegmark. Kan: Kolmogorov-arnold networks.arXiv preprint arXiv:2404.19756, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[6]

On the universality of invariant networks

Haggai Maron, Ethan Fetaya, Nimrod Segol, and Yaron Lipman. On the universality of invariant networks. InInternational conference on machine learning, pages 4363–4371. PMLR, 2019

work page 2019
[7]

Springer, 2003

Jiˇrí Matoušek.Using the Borsuk–Ulam theorem: lectures on topological methods in combina- torics and geometry. Springer, 2003

work page 2003
[8]

Janossy pooling: Learning deep permutation- invariant functions for variable-size inputs

R Murphy, B Srinivasan, V Rao, and B Riberio. Janossy pooling: Learning deep permutation- invariant functions for variable-size inputs. InInternational Conference on Learning Represen- tations (ICLR 2019), 2019

work page 2019
[9]

Pointnet: Deep learning on point sets for 3d classification and segmentation

Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. Pointnet: Deep learning on point sets for 3d classification and segmentation. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 652–660, 2017

work page 2017
[10]

Pointnet++: Deep hierarchical feature learning on point sets in a metric space.Advances in neural information processing systems, 30, 2017

Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J Guibas. Pointnet++: Deep hierarchical feature learning on point sets in a metric space.Advances in neural information processing systems, 30, 2017

work page 2017
[11]

Dimension, superposition of functions and separation of points, in compact metric spaces.Israel Journal of Mathematics, 50(1):13–53, 1985

Y Sternfeld. Dimension, superposition of functions and separation of points, in compact metric spaces.Israel Journal of Mathematics, 50(1):13–53, 1985

work page 1985
[12]

Attention is all you need.Advances in neural information processing systems, 30, 2017

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017

work page 2017
[13]

Universal approximation of functions on sets.Journal of Machine Learning Research, 23(151):1– 56, 2022

Edward Wagstaff, Fabian B Fuchs, Martin Engelcke, Michael A Osborne, and Ingmar Posner. Universal approximation of functions on sets.Journal of Machine Learning Research, 23(151):1– 56, 2022

work page 2022
[14]

Deep sets.Advances in neural information processing systems, 30, 2017

Manzil Zaheer, Satwik Kottur, Siamak Ravanbakhsh, Barnabas Poczos, Russ R Salakhutdinov, and Alexander J Smola. Deep sets.Advances in neural information processing systems, 30, 2017. 10 A Proof of Proposition 2 (ii) Proof. Let r= min{p, nd} . Choose pairwise disjoint closed cubes C1, . . . , Cn ⊂(0,1) d and affine homeomorphismsT j :K→C j. Then L={(T 1t1,...

work page 2017