Measuring the Symmetry--Data Exchange Rate

Ahmed M. Adly

arxiv: 2606.01090 · v1 · pith:IVR6SNQ5new · submitted 2026-05-31 · 📊 stat.ME · cs.LG

Measuring the Symmetry--Data Exchange Rate

Ahmed M. Adly This is my paper

Pith reviewed 2026-06-28 16:47 UTC · model grok-4.3

classification 📊 stat.ME cs.LG

keywords equivariancesymmetry priorsample complexityinductive biasdata augmentationcontrolled experimentexchange ratewrong-group control

0 comments

The pith

A misaligned symmetry prior harms performance more than having no symmetry prior at all.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to quantify the symmetry-data exchange rate, the factor by which an architectural symmetry prior reduces the samples needed to reach a target performance level. On a controlled cyclic-group symmetric task it introduces a wrong-group control that holds orbit size and compute fixed while breaking alignment with the task symmetry. This control produces reliably higher error than the unconstrained baseline. An augmentation baseline that adds test-time orbit averaging matches the equivariant architecture exactly, showing that the usual architecture-versus-augmentation difference is an artifact of asymmetric test-time computation. The primary relative-rate estimator yields a value near the theoretical prediction, though the design is exploratory and the headline result rests on a post-hoc choice of estimator.

Core claim

On a C_n-symmetric task a wrong-group control with identical orbit size is worse than no constraint, with the joint pairwise confidence interval excluding zero and robust across estimators. An augmentation baseline equipped with test-time orbit averaging produces bit-identical per-epoch validation curves to the equivariant model. The relative exchange rate beta_diff equals 1.28, consistent in sign and order of magnitude with the theoretical value of 1.0 under the single-level interval, while the more conservative two-level bootstrap includes zero.

What carries the argument

The relative-rate estimator beta_diff, formed as the difference in slopes of performance versus data size across group sizes, that cancels the shared task-difficulty confound.

If this is right

Misaligned symmetry constraints are actively harmful rather than merely unhelpful.
Architecture-versus-augmentation gaps disappear once test-time computation is equalized by orbit averaging.
The relative-rate estimator, wrong-group control, and failure taxonomy transfer to any inductive bias whose strength can be parameterized by a group size.
The exchange rate can be measured even when absolute difficulty varies across group sizes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same wrong-group control design could be applied to test whether other misaligned priors, such as incorrect invariance or sparsity patterns, are actively detrimental.
If the exchange-rate method generalizes, it supplies a quantitative way to compare the data efficiency of different parameterizable biases on the same task.
A pre-registered replication with external seeds would convert the current exploratory measurement into a confirmatory result.

Load-bearing premise

The post-hoc beta_diff estimator isolates the symmetry-data exchange rate on the coarse grid without residual confounds from the controlled task setup.

What would settle it

A fresh-seed replication on a finer sqrt(2)-spaced grid of group sizes in which the OLS slope for beta_diff lies well outside the reported interval or the wrong-group confidence interval includes zero.

Figures

Figures reproduced from arXiv: 2606.01090 by Ahmed M. Adly.

**Figure 1.** Figure 1: The task and the control idea. (a) Inputs are sampled on an annulus and labeled 1[cos(nθ) > 0] for n ∈ {1, 2, 3, 4, 6, 8, 12}, an alternating angular pattern whose symmetry group is exactly Cn by construction; the number of positive lobes is n, while the full alternating pattern has 2n angular sectors. (b) The equivariant model averages a point over its correct Cn orbit; the wrong-group control averages o… view at source ↗

**Figure 2.** Figure 2: The measured exchange rate. Relative exchange rate βdiff (slope of log2 (N vanilla target/Ntreatment target ) against log2 n) for each treatment at ε = 0, with 95% confidence intervals from a 10,000-sample pairs bootstrap; the dashed line marks the theoretical prediction of +1.0. The equivariant model sits at +1.28, consistent with theory; the regularized baseline is near zero; the wronggroup control is n… view at source ↗

**Figure 3.** Figure 3: The scaling law the rate comes from. log2 Ntarget against log2 n at ε = 0 for all five families, with ordinary-leastsquares fits. All absolute slopes are positive because task difficulty grows with n; the equivariant slope is the shallowest. The exchange rate is the gap between the vanilla and equivariant slopes. Takeaway: the structural advantage is visible as a difference of slopes, not as a negative sl… view at source ↗

**Figure 4.** Figure 4: The augmentation phase transition. log2 Ntarget for each (model, n) pair, one panel per ε; a dash marks failure to reach the target at any sample size in the grid. The augmented row is dashed for all n ≥ 3 at every ε — orbit augmentation never reaches the target there — while the equivariant row stays low across the full range. Takeaway: on identical information, augmentation and architectural invariance f… view at source ↗

**Figure 5.** Figure 5: Robustness to training-label symmetry breaking. [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗

**Figure 6.** Figure 6: βdiff (vs vanilla) by treatment across T ∈ {0.70, 0.75, 0.80, 0.85} on the CPU replication, with 10,000-sample percentile bootstrap CIs. Equivariant and augmented-with-TTA overlap; wrong-group is consistently negative; regularized hugs zero. The equivariant rate decreases as T rises (refuting the conjecture in Section 4.6). bit-identical between the two families across all 245 matched (n, N,seed) cells whe… view at source ↗

**Figure 7.** Figure 7: log2 Ntarget at T = 0.80 across n on the CPU replication. The equivariant and augmented-with-TTA curves overlap exactly at every n (orbit averaging at training and test time is operationally equivalent to architectural equivariance). The trainingtime-only augmented baseline (purple) fails to reach the target for n ≥ 3, undefined beyond. 20 [PITH_FULL_IMAGE:figures/full_fig_p020_7.png] view at source ↗

**Figure 8.** Figure 8: Regularization pilot (n = 6, N = 400, 3 seeds): the weight-L2 norm ratio (regularized / equivariant) crosses 1.0 at λ ≈ 10−3 , the headline value used in the paper. 21 [PITH_FULL_IMAGE:figures/full_fig_p021_8.png] view at source ↗

read the original abstract

Equivariance theory predicts that an architectural symmetry prior reduces sample complexity by a factor of |G|; this is widely cited but rarely measured as a scaling law with controls that separate the prior from its confounds. On a controlled C_n-symmetric task, we report three findings. First, a wrong-group control with identical orbit size and matched compute is worse than no constraint (joint pairwise CI [+0.79, +3.26] excludes zero, robust across estimators); misaligned constraint is actively harmful, not merely unhelpful. Second, an augmentation baseline equipped with test-time orbit averaging matches the equivariant model exactly -- bit-identical per-epoch validation curves across matched cells -- so the architecture-vs-augmentation gap is conditional on asymmetric test-time computation, not unconditional. Third, the relative exchange rate beta_diff = 1.28 is consistent in sign and order of magnitude with the theoretical 1.0 (single-level CI [+0.92, +2.05]); the more conservative two-level bootstrap (seeds x group sizes) widens this to [-0.63, +1.72], including zero, and a finer-N replication on a sqrt(2)-spaced grid is inconclusive (point estimate -0.82). The methodological contributions -- the relative-rate estimator that cancels the shared-difficulty confound, the wrong-group control, and a pre-specified failure taxonomy -- transfer to any inductive bias whose strength can be parameterised. Honest scoping: the primary estimator beta_diff was adopted post-hoc after the initial analysis revealed a positive-slope identifiability problem; the design was never externally pre-registered; and the headline number rests on an OLS slope over seven group sizes on a coarse N grid. This is an exploratory study, not a confirmatory measurement; the wrong-group result is the cleanest finding and the one we report with the most confidence. A registered replication on fresh seeds is future work.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The mismatched symmetry control hurts more than no constraint at all, but the exchange rate estimate is too noisy to trust.

read the letter

The paper's clearest result is that a wrong-group control with matched orbit size and compute performs worse than an unconstrained baseline, with a joint CI excluding zero that holds across estimators. This pushes back on the common assumption that symmetry priors are at worst neutral.

What is new is the wrong-group control itself and the relative-rate estimator designed to cancel shared-difficulty confounds. These are practical additions for testing how much an inductive bias actually buys, and they are not standard in the equivariance literature referenced.

The paper does well by flagging its own exploratory status, the post-hoc switch to beta_diff after an identifiability problem, and the inconclusive finer-grid replication. The harm finding is presented with appropriate caution and looks the most robust.

The soft spot is that beta_diff comes from OLS over seven coarse points, the two-level bootstrap CI includes zero, and the design was never pre-registered. Without seeing the full implementation details it is hard to rule out residual task-specific confounds, though the authors are open about this.

This is for researchers who want empirical ways to measure symmetry priors or other inductive biases in data-limited settings. The control ideas could transfer.

It deserves peer review because the harm result is cleanly scoped and the methods are usable even if the headline rate needs tighter data.

Referee Report

2 major / 2 minor

Summary. The manuscript empirically measures the symmetry-data exchange rate on a controlled C_n-symmetric task. It reports three main findings: (1) a wrong-group control with identical orbit size and matched compute is actively harmful relative to no constraint (joint pairwise CI [+0.79, +3.26] excludes zero, robust across estimators); (2) an augmentation baseline with test-time orbit averaging produces bit-identical validation curves to the equivariant model; (3) the relative exchange rate beta_diff equals 1.28 (single-level CI [+0.92, +2.05]), consistent in sign and magnitude with the theoretical value of 1.0, though the conservative two-level bootstrap widens to [-0.63, +1.72] (includes zero) and a finer-N replication is inconclusive. The study is explicitly scoped as exploratory, with post-hoc adoption of beta_diff after an identifiability issue, no external pre-registration, and reliance on OLS over seven coarse grid points.

Significance. If the wrong-group harm result holds under the stated controls, it supplies direct empirical evidence that misaligned inductive biases can increase sample complexity rather than merely failing to reduce it, with clear implications for equivariance theory and inductive-bias design. The relative-rate estimator (which cancels shared-difficulty confounds) and the wrong-group control are transferable methodological contributions to any parameterized inductive bias. The manuscript's explicit honesty about its exploratory status, post-hoc estimator choice, and inconclusive replication strengthens its credibility as a methods contribution in stat.ME.

major comments (2)

[Abstract] Abstract and results on beta_diff: the primary estimator was adopted post-hoc after the initial analysis revealed a positive-slope identifiability problem; the reported consistency with theory therefore rests on an OLS slope fitted to seven points on a coarse N grid, and the more conservative two-level bootstrap CI includes zero while the sqrt(2)-spaced replication yields -0.82. This makes the exchange-rate claim load-bearing only under the acknowledged exploratory framing and requires explicit discussion of residual confounds from the controlled task setup.
[Methods] Methods and results on beta_diff: the claim that beta_diff validly isolates the symmetry-data exchange rate assumes the post-hoc OLS specification over the coarse grid has no residual confounds from orbit-size matching or compute equalization; the paper does not provide a pre-specified sensitivity analysis or alternative estimators that were ruled out before seeing the data.

minor comments (2)

[Abstract] The abstract states that the methodological contributions transfer to any inductive bias whose strength can be parameterised; a short paragraph illustrating one additional example (e.g., sparsity or invariance) would clarify scope without lengthening the paper.
The pre-specified failure taxonomy is mentioned but not detailed; a brief enumeration or reference to supplementary material would aid reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive review and for highlighting the need for explicit discussion of the exploratory framing around beta_diff. We agree that the post-hoc adoption of the estimator and lack of pre-specification warrant additional caveats on potential confounds, and we will revise the manuscript to address these points directly.

read point-by-point responses

Referee: [Abstract] Abstract and results on beta_diff: the primary estimator was adopted post-hoc after the initial analysis revealed a positive-slope identifiability problem; the reported consistency with theory therefore rests on an OLS slope fitted to seven points on a coarse N grid, and the more conservative two-level bootstrap CI includes zero while the sqrt(2)-spaced replication yields -0.82. This makes the exchange-rate claim load-bearing only under the acknowledged exploratory framing and requires explicit discussion of residual confounds from the controlled task setup.

Authors: We agree. The manuscript already flags the post-hoc adoption, exploratory status, and inconclusive replication, but we will revise the abstract to add an explicit sentence on residual confounds (e.g., from orbit-size matching and compute equalization) and to qualify the consistency claim more narrowly as holding only under the exploratory framing. revision: yes
Referee: [Methods] Methods and results on beta_diff: the claim that beta_diff validly isolates the symmetry-data exchange rate assumes the post-hoc OLS specification over the coarse grid has no residual confounds from orbit-size matching or compute equalization; the paper does not provide a pre-specified sensitivity analysis or alternative estimators that were ruled out before seeing the data.

Authors: We acknowledge the absence of pre-specification as a genuine limitation of the study. In revision we will insert a short methods subsection that (a) states the OLS choice was post-hoc, (b) lists the sensitivity checks performed after seeing the data, and (c) explicitly notes that no pre-specified analysis plan existed. We will not claim the estimator is free of all residual confounds. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical measurements with explicit exploratory scoping

full rationale

The manuscript reports direct empirical estimates (OLS slopes, bootstrap CIs, pairwise comparisons) from controlled experiments on a C_n-symmetric task rather than any derivation chain. Quantities such as beta_diff are fitted from observed validation curves on a coarse N grid; the paper does not claim these reduce to theoretical inputs by construction or rename a fitted parameter as an independent prediction. The text explicitly flags the post-hoc adoption of the primary estimator, the lack of external pre-registration, and the inconclusive finer-N replication. No self-definitional equations, load-bearing self-citations, uniqueness theorems, or ansatz smuggling are present. The strongest claim (wrong-group harm) rests on a joint CI excluding zero from matched-compute controls, which is an experimental outcome, not a reduction to prior inputs. This is the most common honest finding for an empirical measurement study.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claims rest on the validity of the C_n-symmetric controlled task, matched compute/orbit-size controls, and statistical assumptions underlying the OLS slope and two-level bootstrap; the post-hoc estimator choice is explicitly flagged.

free parameters (1)

beta_diff = 1.28
OLS slope fitted across seven group sizes on coarse N grid to estimate exchange rate

axioms (1)

domain assumption The experimental task is exactly C_n-symmetric and controls isolate the symmetry prior from compute and orbit-size confounds
Invoked in the controlled experiment design and wrong-group comparison

pith-pipeline@v0.9.1-grok · 5877 in / 1466 out tokens · 41545 ms · 2026-06-28T16:47:29.886233+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

23 extracted references · 5 canonical work pages

[1]

Cormorant: Covariant molecular neural networks

Brandon Anderson, Truong-Son Hy, and Risi Kondor. Cormorant: Covariant molecular neural networks. In Advances in Neural Information Processing Systems (NeurIPS), 2019

2019
[2]

Mirkes, Alexander N

Jonathan Bac, Evgeny M. Mirkes, Alexander N. Gorban, Ivan Tyukin, and Andrei Zinovyev. Scikit-dimension: A Python package for intrinsic dimension estimation.Entropy, 23(10):1368, 2021. doi:10.3390/e23101368

work page doi:10.3390/e23101368 2021
[3]

Mailoa, Mordechai Kornbluth, Nicola Molinari, Tess E

Simon Batzner, Albert Musaelian, Lixin Sun, Mario Geiger, Jonathan P. Mailoa, Mordechai Kornbluth, Nicola Molinari, Tess E. Smidt, and Boris Kozinsky. E(3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials.Nature Communications, 13(1):2453, 2022. doi:10.1038/s41467-022-29939-5

work page doi:10.1038/s41467-022-29939-5 2022
[4]

A model of inductive bias learning.Journal of Artificial Intelligence Research, 12:149–198,

Jonathan Baxter. A model of inductive bias learning.Journal of Artificial Intelligence Research, 12:149–198,
[5]

doi:10.1613/jair.731

work page doi:10.1613/jair.731
[6]

On the sample complexity of learning under geometric stability

Alberto Bietti, Luca Venturi, and Joan Bruna. On the sample complexity of learning under geometric stability. In Advances in Neural Information Processing Systems 34 (NeurIPS), 2021

2021
[7]

Bronstein, Joan Bruna, Taco Cohen, and Petar Veliˇckovi´c

Michael M. Bronstein, Joan Bruna, Taco Cohen, and Petar Veliˇckovi´c. Geometric deep learning: Grids, groups, graphs, geodesics, and gauges.arXiv preprint arXiv:2104.13478, 2021

Pith/arXiv arXiv 2021
[8]

Cohen and Max Welling

Taco S. Cohen and Max Welling. Group equivariant convolutional networks. InProceedings of the 33rd International Conference on Machine Learning (ICML), 2016. 16 Measuring the Symmetry–Data Exchange RateA PREPRINT

2016
[9]

Convolutional neural networks on graphs with fast localized spectral filtering

Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. Convolutional neural networks on graphs with fast localized spectral filtering. InAdvances in Neural Information Processing Systems (NIPS), 2016

2016
[10]

Multiple comparisons among means.Journal of the American Statistical Association, 56(293): 52–64, 1961

Olive Jean Dunn. Multiple comparisons among means.Journal of the American Statistical Association, 56(293): 52–64, 1961

1961
[11]

Tibshirani.An Introduction to the Bootstrap

Bradley Efron and Robert J. Tibshirani.An Introduction to the Bootstrap. Chapman & Hall, 1993

1993
[12]

Provably strict generalisation benefit for equivariant models

Bryn Elesedy and Sheheryar Zaidi. Provably strict generalisation benefit for equivariant models. InProceedings of the 38th International Conference on Machine Learning (ICML), volume 139 ofProceedings of Machine Learning Research, pages 2959–2969, 2021

2021
[13]

Estimating the intrinsic dimension of datasets by a minimal neighborhood information.Scientific Reports, 7:12140, 2017

Elena Facco, Maria d’Errico, Alex Rodriguez, and Alessandro Laio. Estimating the intrinsic dimension of datasets by a minimal neighborhood information.Scientific Reports, 7:12140, 2017. doi:10.1038/s41598-017-11873-y

work page doi:10.1038/s41598-017-11873-y 2017
[14]

The Garden of Forking Paths

Andrew Gelman and Eric Loken. The statistical crisis in science.American Scientist, 102(6):460–465, 2014. Published version of the 2013 working paper “The Garden of Forking Paths”

2014
[15]

On the generalization of equivariance and convolution in neural networks to the action of compact groups

Risi Kondor and Shubhendu Trivedi. On the generalization of equivariance and convolution in neural networks to the action of compact groups. InProceedings of the 35th International Conference on Machine Learning (ICML), 2018

2018
[16]

Elizaveta Levina and Peter J. Bickel. Maximum likelihood estimation of intrinsic dimension. InAdvances in Neural Information Processing Systems 17 (NIPS), pages 777–784, 2004

2004
[17]

Learning with invariances in random features and kernel models

Song Mei, Theodor Misiakiewicz, and Andrea Montanari. Learning with invariances in random features and kernel models. InProceedings of the 34th Conference on Learning Theory (COLT), volume 134 ofProceedings of Machine Learning Research, 2021

2021
[18]

PyTorch: An imperative style, high-performance deep learning library

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. PyTorch: An imperative style, high-performance deep learning library. InAdvances in Neural Information Processing Systems (NeurIPS), 2019

2019
[19]

Scikit-learn: Machine learning in Python

Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et al. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011

2011
[20]

Tensor field networks: Rotation- and translation-equivariant neural networks for 3D point clouds.arXiv preprint arXiv:1802.08219, 2018

Nathaniel Thomas, Tess Smidt, Steven Kearnes, Lusann Yang, Li Li, Kai Kohlhoff, and Patrick Riley. Tensor field networks: Rotation- and translation-equivariant neural networks for 3D point clouds.arXiv preprint arXiv:1802.08219, 2018

Pith/arXiv arXiv 2018
[21]

General E(2)-equivariant steerable CNNs

Maurice Weiler and Gabriele Cesa. General E(2)-equivariant steerable CNNs. InAdvances in Neural Information Processing Systems (NeurIPS), 2019

2019
[22]

David H. Wolpert. The lack of a priori distinctions between learning algorithms.Neural Computation, 8(7): 1341–1390, 1996. doi:10.1162/neco.1996.8.7.1341

work page doi:10.1162/neco.1996.8.7.1341 1996
[23]

untested

Manzil Zaheer, Satwik Kottur, Siamak Ravanbakhsh, Barnabás Póczos, Ruslan Salakhutdinov, and Alexander J. Smola. Deep sets. InAdvances in Neural Information Processing Systems (NeurIPS), 2017. A. HYPERPARAMETERS AND SOFTWARE ENVIRONMENT Hidden width 32; two hidden ReLU layers plus a linear output; Adam at learning rate 10−3; batch size 64; up to 500 epoch...

2017

[1] [1]

Cormorant: Covariant molecular neural networks

Brandon Anderson, Truong-Son Hy, and Risi Kondor. Cormorant: Covariant molecular neural networks. In Advances in Neural Information Processing Systems (NeurIPS), 2019

2019

[2] [2]

Mirkes, Alexander N

Jonathan Bac, Evgeny M. Mirkes, Alexander N. Gorban, Ivan Tyukin, and Andrei Zinovyev. Scikit-dimension: A Python package for intrinsic dimension estimation.Entropy, 23(10):1368, 2021. doi:10.3390/e23101368

work page doi:10.3390/e23101368 2021

[3] [3]

Mailoa, Mordechai Kornbluth, Nicola Molinari, Tess E

Simon Batzner, Albert Musaelian, Lixin Sun, Mario Geiger, Jonathan P. Mailoa, Mordechai Kornbluth, Nicola Molinari, Tess E. Smidt, and Boris Kozinsky. E(3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials.Nature Communications, 13(1):2453, 2022. doi:10.1038/s41467-022-29939-5

work page doi:10.1038/s41467-022-29939-5 2022

[4] [4]

A model of inductive bias learning.Journal of Artificial Intelligence Research, 12:149–198,

Jonathan Baxter. A model of inductive bias learning.Journal of Artificial Intelligence Research, 12:149–198,

[5] [5]

doi:10.1613/jair.731

work page doi:10.1613/jair.731

[6] [6]

On the sample complexity of learning under geometric stability

Alberto Bietti, Luca Venturi, and Joan Bruna. On the sample complexity of learning under geometric stability. In Advances in Neural Information Processing Systems 34 (NeurIPS), 2021

2021

[7] [7]

Bronstein, Joan Bruna, Taco Cohen, and Petar Veliˇckovi´c

Michael M. Bronstein, Joan Bruna, Taco Cohen, and Petar Veliˇckovi´c. Geometric deep learning: Grids, groups, graphs, geodesics, and gauges.arXiv preprint arXiv:2104.13478, 2021

Pith/arXiv arXiv 2021

[8] [8]

Cohen and Max Welling

Taco S. Cohen and Max Welling. Group equivariant convolutional networks. InProceedings of the 33rd International Conference on Machine Learning (ICML), 2016. 16 Measuring the Symmetry–Data Exchange RateA PREPRINT

2016

[9] [9]

Convolutional neural networks on graphs with fast localized spectral filtering

Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. Convolutional neural networks on graphs with fast localized spectral filtering. InAdvances in Neural Information Processing Systems (NIPS), 2016

2016

[10] [10]

Multiple comparisons among means.Journal of the American Statistical Association, 56(293): 52–64, 1961

Olive Jean Dunn. Multiple comparisons among means.Journal of the American Statistical Association, 56(293): 52–64, 1961

1961

[11] [11]

Tibshirani.An Introduction to the Bootstrap

Bradley Efron and Robert J. Tibshirani.An Introduction to the Bootstrap. Chapman & Hall, 1993

1993

[12] [12]

Provably strict generalisation benefit for equivariant models

Bryn Elesedy and Sheheryar Zaidi. Provably strict generalisation benefit for equivariant models. InProceedings of the 38th International Conference on Machine Learning (ICML), volume 139 ofProceedings of Machine Learning Research, pages 2959–2969, 2021

2021

[13] [13]

Estimating the intrinsic dimension of datasets by a minimal neighborhood information.Scientific Reports, 7:12140, 2017

Elena Facco, Maria d’Errico, Alex Rodriguez, and Alessandro Laio. Estimating the intrinsic dimension of datasets by a minimal neighborhood information.Scientific Reports, 7:12140, 2017. doi:10.1038/s41598-017-11873-y

work page doi:10.1038/s41598-017-11873-y 2017

[14] [14]

The Garden of Forking Paths

Andrew Gelman and Eric Loken. The statistical crisis in science.American Scientist, 102(6):460–465, 2014. Published version of the 2013 working paper “The Garden of Forking Paths”

2014

[15] [15]

On the generalization of equivariance and convolution in neural networks to the action of compact groups

Risi Kondor and Shubhendu Trivedi. On the generalization of equivariance and convolution in neural networks to the action of compact groups. InProceedings of the 35th International Conference on Machine Learning (ICML), 2018

2018

[16] [16]

Elizaveta Levina and Peter J. Bickel. Maximum likelihood estimation of intrinsic dimension. InAdvances in Neural Information Processing Systems 17 (NIPS), pages 777–784, 2004

2004

[17] [17]

Learning with invariances in random features and kernel models

Song Mei, Theodor Misiakiewicz, and Andrea Montanari. Learning with invariances in random features and kernel models. InProceedings of the 34th Conference on Learning Theory (COLT), volume 134 ofProceedings of Machine Learning Research, 2021

2021

[18] [18]

PyTorch: An imperative style, high-performance deep learning library

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. PyTorch: An imperative style, high-performance deep learning library. InAdvances in Neural Information Processing Systems (NeurIPS), 2019

2019

[19] [19]

Scikit-learn: Machine learning in Python

Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et al. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011

2011

[20] [20]

Tensor field networks: Rotation- and translation-equivariant neural networks for 3D point clouds.arXiv preprint arXiv:1802.08219, 2018

Nathaniel Thomas, Tess Smidt, Steven Kearnes, Lusann Yang, Li Li, Kai Kohlhoff, and Patrick Riley. Tensor field networks: Rotation- and translation-equivariant neural networks for 3D point clouds.arXiv preprint arXiv:1802.08219, 2018

Pith/arXiv arXiv 2018

[21] [21]

General E(2)-equivariant steerable CNNs

Maurice Weiler and Gabriele Cesa. General E(2)-equivariant steerable CNNs. InAdvances in Neural Information Processing Systems (NeurIPS), 2019

2019

[22] [22]

David H. Wolpert. The lack of a priori distinctions between learning algorithms.Neural Computation, 8(7): 1341–1390, 1996. doi:10.1162/neco.1996.8.7.1341

work page doi:10.1162/neco.1996.8.7.1341 1996

[23] [23]

untested

Manzil Zaheer, Satwik Kottur, Siamak Ravanbakhsh, Barnabás Póczos, Ruslan Salakhutdinov, and Alexander J. Smola. Deep sets. InAdvances in Neural Information Processing Systems (NeurIPS), 2017. A. HYPERPARAMETERS AND SOFTWARE ENVIRONMENT Hidden width 32; two hidden ReLU layers plus a linear output; Adam at learning rate 10−3; batch size 64; up to 500 epoch...

2017