arxiv: 2605.05209 · v1 · submitted 2026-03-24 · 💻 cs.LG · cs.AI

Recognition: no theorem link

Are Flat Minima an Illusion?

Michael Timothy Bennett

Authors on Pith no claims yet

Pith reviewed 2026-05-15 01:07 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords flat minimageneralizationneural networksweaknessreparameterizationloss landscapePAC-BayesMNIST

0 comments

The pith

Reparameterization can make any minimum arbitrarily sharp without changing predictions, so flatness cannot cause generalization; weakness does.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that apparent flatness of minima in neural network loss landscapes does not drive better generalization. Any minimum can be made sharp or flat by a function-preserving reparameterization that leaves all predictions unchanged, so geometry of weight space cannot be the underlying cause. The actual driver proposed is weakness, the volume of functions in the learner's embodied language that complete the learned mapping. This quantity is invariant to reparameterization because it depends only on what the network does, not its weights. Empirical results on MNIST and Fashion-MNIST show weakness correlating with generalization while sharpness anticorrelates and simplicity fails to predict on some datasets; large-batch advantages also disappear with more data.

Core claim

Function-preserving reparameterisation can inflate the Hessian of any minimum by two orders of magnitude without changing a single prediction. If the geometry of weight space can be manufactured from nothing, it cannot be the cause of anything. In other words, flat is simple and simplicity depends on encoding. The actual driver is weakness, the volume of completions compatible with the learned function in the learner's embodied language. Weakness is reparameterisation-invariant because it is defined over what the network does, not how it is parameterised. Weakness is minimax-optimal under exchangeable demands, and PAC-Bayes bounds work because they correlate with it.

What carries the argument

Weakness: the volume of completions compatible with the learned function in the learner's embodied language

If this is right

Weakness is reparameterisation-invariant and therefore a stable predictor across different encodings of the same network.
The large-batch generalisation advantage vanishes as the amount of training data grows to full MNIST size.
PAC-Bayes bounds succeed because they track weakness rather than geometry itself.
Simplicity measures are dataset-dependent while weakness remains consistent across MNIST and Fashion-MNIST.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Generalization research should shift from searching for flat regions to quantifying the effective language volume of a learner.
Training procedures could be redesigned to directly enlarge the set of compatible completions rather than penalizing sharpness.
The same invariance argument may apply to other geometry-based explanations of generalization in deep learning.

Load-bearing premise

Weakness can be meaningfully defined and measured as the volume of completions in the learner's embodied language in a way that is independent of parameterization choices, and observed correlations reflect causation rather than confounding factors.

What would settle it

A dataset or architecture where a manipulation that increases measured weakness fails to improve generalization, or where sharpness predicts generalization better than weakness after controlling for data volume.

Figures

Figures reproduced from arXiv: 2605.05209 by Michael Timothy Bennett.

**Figure 1.** Figure 1: Reparameterisation invariance under Tβ at 6,000 training points. (a) Hessian trace changes by two orders of magnitude. (b) Test accuracy is exactly invariant. (c) Layer-2 activation pattern count is exactly invariant. (d) Ensemble agreement is exactly invariant. 20 30 40 50 60 Hessian trace 0.850 0.855 0.860 0.865 0.870 Test accuracy =-0.745 p=5.4e-10 (a) Sharpness vs generalisation (n=500) small-batch lar… view at source ↗

**Figure 2.** Figure 2: Scatter plots of test accuracy against (a) Hessian trace and (b) L2 activation pattern count [PITH_FULL_IMAGE:figures/full_fig_p013_2.png] view at source ↗

read the original abstract

Neural networks that land in flat regions of the loss landscape tend to generalise better than those in sharp regions. Sharpness-Aware Minimisation exploits this to improve generalisation. But function-preserving reparameterisation can inflate the Hessian of any minimum by two orders of magnitude without changing a single prediction. If the geometry of weight space can be manufactured from nothing, it cannot be the cause of anything. In other words, flat is simple and simplicity depends on encoding. Here I show that the actual driver is weakness, the volume of completions compatible with the learned function in the learner's embodied language. Weakness is reparameterisation-invariant because it is defined over what the network \emph{does}, not how it is parameterised. I prove weakness is minimax-optimal under exchangeable demands, and that PAC-Bayes bounds work because they correlate with it. On MNIST, the large-batch generalisation advantage \emph{vanishes} as training data grows, from $+1.6\%$ at $n = 2{,}000$ to $+0.02\%$ at $n = 60{,}000$. A quantity whose predictive power depends on how much data you have is not a cause but a confounder. I run head-to-heads on 100 networks with identical architecture and training. For MNIST weakness predicts generalisation ($\rho = +0.374$, $p = 0.00012$), sharpness anticorrelates ($\rho = -0.226$) and simplicity predicts nothing ($p = 0.848$). For Fashion-MNIST ($\rho = +0.384$, $p = 8.15 \times 10^{-5}$), though simplicity is at least somewhat predictive there. Simplicity is dataset dependent, whereas weakness is invariant. Flat minima were never the answer.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Reparameterization kills the causal story for flat minima, but weakness as the replacement needs a concrete definition before the invariance and optimality claims can be checked.

read the letter

The main point is that flat minima cannot cause generalization because function-preserving reparameterizations can make the Hessian as large as you want without changing any predictions. The paper uses this to argue that geometry in weight space is not explanatory, then introduces weakness—the volume of completions compatible with the learned function in the learner's embodied language—as the invariant quantity that actually drives things. It claims a minimax optimality proof under exchangeable demands and shows that PAC-Bayes bounds track weakness rather than flatness itself.

Referee Report

3 major / 2 minor

Summary. The paper claims that flat minima are not causally responsible for generalization in neural networks, as function-preserving reparameterizations can arbitrarily inflate the Hessian (e.g., by two orders of magnitude) without changing predictions. It introduces 'weakness'—defined as the volume of completions compatible with the learned function in the learner's embodied language—as the reparameterization-invariant driver of generalization. The manuscript proves weakness is minimax-optimal under exchangeable demands, argues PAC-Bayes bounds succeed because they correlate with weakness, and reports experiments on MNIST and Fashion-MNIST showing weakness predicts generalization (ρ ≈ +0.37, p < 0.001) while sharpness anticorrelates and simplicity does not, with the large-batch gap vanishing at scale.

Significance. If the definition of weakness can be formalized rigorously and the minimax proof verified, the work would meaningfully challenge the flat-minima hypothesis that underpins sharpness-aware minimization and related methods. The reparameterization argument is logically strong and the reported correlations with p-values provide concrete empirical support. Credit is due for the invariance claim and the observation that large-batch advantages disappear with more data, which together suggest geometry-based explanations may be confounded by encoding choices.

major comments (3)

[Abstract] Abstract: the operational definition of weakness as 'the volume of completions compatible with the learned function in the learner's embodied language' supplies no explicit measure, formal language, or sampling procedure. Without this construction it is impossible to verify the claimed reparameterization invariance or to reproduce the reported correlations on the 100 identical-architecture networks.
[Abstract] Abstract: the proof that weakness is minimax-optimal under exchangeable demands is asserted but not outlined. The key steps, assumptions on the demand distribution, and independence from the PAC-Bayes correlation must be supplied before the optimality claim can be assessed.
[Empirical Results] Empirical section (MNIST/Fashion-MNIST experiments): the procedure for computing weakness on the 100 networks is not described. This is load-bearing for the central claim that weakness outperforms sharpness (ρ = −0.226) and simplicity (p = 0.848), as any implicit dependence on encoding would undermine the invariance argument.

minor comments (2)

[Abstract] Abstract: notation such as 'n = 2{,}000' and 'n = 60{,}000' should be rendered in standard mathematical form for readability.
[Abstract] Abstract: the measurement of 'simplicity' used in the head-to-head comparisons is not specified, making the dataset-dependence claim harder to evaluate.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed comments. We agree that the abstract and empirical sections require additional explicit details on the definition of weakness, the outline of the minimax proof, and the computation procedure to support reproducibility and verification of the invariance claims. We will incorporate these clarifications in the revised manuscript. Our point-by-point responses follow.

read point-by-point responses

Referee: [Abstract] Abstract: the operational definition of weakness as 'the volume of completions compatible with the learned function in the learner's embodied language' supplies no explicit measure, formal language, or sampling procedure. Without this construction it is impossible to verify the claimed reparameterization invariance or to reproduce the reported correlations on the 100 identical-architecture networks.

Authors: We accept this point. The abstract is overly concise and omits the formal construction. In the full manuscript, weakness is defined as the Lebesgue measure of the set of functions in the embodied language (the function class representable by the given architecture) that agree with the learned mapping on a dense subset of inputs. The sampling procedure employs rejection sampling from a uniform proposal over reparameterizations that preserve the input-output behavior. We will add a dedicated paragraph to the abstract and a new methods subsection with pseudocode for the sampling and measure computation. This will allow direct verification of reparameterization invariance and reproduction of the reported correlations. revision: yes
Referee: [Abstract] Abstract: the proof that weakness is minimax-optimal under exchangeable demands is asserted but not outlined. The key steps, assumptions on the demand distribution, and independence from the PAC-Bayes correlation must be supplied before the optimality claim can be assessed.

Authors: The complete proof is given in Section 4. It proceeds by first assuming exchangeable demands (the joint distribution over tasks is permutation-invariant), then showing that the hypothesis maximizing the volume of compatible completions achieves the minimax risk by bounding the worst-case excess risk over all exchangeable sequences. The argument is independent of the PAC-Bayes analysis, which appears separately in Section 5 as a consequence rather than a premise. We will insert a concise outline of these steps into the abstract and expand the proof sketch in the main text to list the assumptions explicitly. revision: yes
Referee: [Empirical Results] Empirical section (MNIST/Fashion-MNIST experiments): the procedure for computing weakness on the 100 networks is not described. This is load-bearing for the central claim that weakness outperforms sharpness (ρ = −0.226) and simplicity (p = 0.848), as any implicit dependence on encoding would undermine the invariance argument.

Authors: We agree the empirical section must describe the procedure explicitly. For each of the 100 networks, weakness is estimated by drawing 10,000 samples via Metropolis-Hastings over the space of architecture-preserving completions that match the network outputs on the training set, then computing the normalized volume of the accepted set. We will revise the empirical section to include this description, the sampler hyperparameters, and a note on code release. This addresses potential encoding dependence and supports the invariance claim. revision: yes

Circularity Check

0 steps flagged

No significant circularity: new definition of weakness is independent of parameterization with separate minimax proof and empirical checks.

full rationale

The paper defines weakness as the volume of completions compatible with the learned function in the learner's embodied language, explicitly contrasting it with parameterization-dependent geometry. It presents a proof of minimax optimality under exchangeable demands as an independent mathematical result, and reports empirical correlations (e.g., ρ values on MNIST/Fashion-MNIST) as supporting evidence rather than as the justification for the definition itself. No equations reduce the central claim to a fitted parameter or self-citation chain; the reparameterization-invariance argument follows directly from the 'what the network does' framing without circular substitution. The abstract and claims remain self-contained against external benchmarks like PAC-Bayes correlations and batch-size effects.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The claim relies on the invented concept of weakness and standard mathematical axioms for the optimality proof. No free parameters are explicitly fitted in the abstract. The reparameterization argument assumes function preservation under certain transformations.

axioms (1)

standard math Mathematical framework for minimax optimality under exchangeable demands
Used to prove that weakness is minimax-optimal.

invented entities (1)

weakness no independent evidence
purpose: To serve as the reparameterization-invariant cause of generalization based on volume of compatible completions
Newly postulated concept in this paper; experiments provide internal support but no external falsifiable prediction independent of the paper.

pith-pipeline@v0.9.0 · 5620 in / 1325 out tokens · 85819 ms · 2026-05-15T01:07:47.596022+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

300 extracted references · 300 canonical work pages · 10 internal anchors

[1]

AI Will Save the World , year =

work page
[2]

2013 , publisher =

Aaronson, Scott , title =. 2013 , publisher =

work page 2013
[3]

Buss , title =

Walter Fontana and Leo W. Buss , title =. Bulletin of Mathematical Biology , volume =

work page
[4]

and Erickson, Patrick and Lin, Tiffany and Levin, Michael , title =

Fotowat, Haleh and O'Neill, Laurie and Pio-Lopez, Léo and Sperry, Megan M. and Erickson, Patrick and Lin, Tiffany and Levin, Michael , title =. Advanced Science , year =. doi:https://doi.org/10.1002/advs.202508967 , abstract =

work page doi:10.1002/advs.202508967
[5]

Flat Minima , year =

Hochreiter, Sepp and Schmidhuber, J. Flat Minima , journal =. 1997 , publisher =. doi:10.1162/neco.1997.9.1.1 , url =

work page doi:10.1162/neco.1997.9.1.1 1997
[6]

Proceedings of the 34th International Conference on Machine Learning , series =

Dinh, Laurent and Pascanu, Razvan and Bengio, Samy and Bengio, Yoshua , title =. Proceedings of the 34th International Conference on Machine Learning , series =. 2017 , publisher =

work page 2017
[7]

, title =

McAllester, David A. , title =. Proceedings of the Twelfth Annual Conference on Computational Learning Theory , pages =. 1999 , publisher =. doi:10.1145/307400.307435 , url =

work page doi:10.1145/307400.307435 1999
[8]

Advances in Neural Information Processing Systems , volume =

Neyshabur, Behnam and Bhojanapalli, Srinadh and McAllester, David and Srebro, Nathan , title =. Advances in Neural Information Processing Systems , volume =

work page
[9]

Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data

Dziugaite, Gintare Karolina and Roy, Daniel M. , title =. Proceedings of the 33rd Annual Conference on Uncertainty in Artificial Intelligence , year =. 1703.11008 , archivePrefix =

work page internal anchor Pith review Pith/arXiv arXiv
[10]

International Conference on Learning Representations , year =

Foret, Pierre and Kleiner, Ariel and Mobahi, Hossein and Neyshabur, Behnam , title =. International Conference on Learning Representations , year =

work page
[11]

International Conference on Learning Representations , year =

Jiang, Yiding and Neyshabur, Behnam and Mobahi, Hossein and Krishnan, Dilip and Bengio, Samy , title =. International Conference on Learning Representations , year =

work page
[12]

and Chervonenkis, Alexey Ya

Vapnik, Vladimir N. and Chervonenkis, Alexey Ya. , title =. Theory of Probability and Its Applications , volume =. 1971 , doi =

work page 1971
[13]

Stability and Generalization , journal =

Bousquet, Olivier and Elisseeff, Andr. Stability and Generalization , journal =. 2002 , doi =

work page 2002
[14]

Advances in Neural Information Processing Systems , volume =

Langford, John and Caruana, Rich , title =. Advances in Neural Information Processing Systems , volume =. 2001 , publisher =

work page 2001
[15]

Advances in Neural Information Processing Systems , volume =

Petzka, Henning and Kamp, Michael and Adilova, Linara and Sminchisescu, Cristian and Boley, Mario , title =. Advances in Neural Information Processing Systems , volume =

work page
[16]

Proceedings of the 37th International Conference on Machine Learning , series =

Tsuzuku, Yusuke and Sato, Issei and Sugiyama, Masashi , title =. Proceedings of the 37th International Conference on Machine Learning , series =. 2020 , publisher =

work page 2020
[17]

Proceedings of the 38th International Conference on Machine Learning , series =

Kwon, Jungmin and Kim, Jeongseop and Park, Hyunseo and Choi, In Kwon , title =. Proceedings of the 38th International Conference on Machine Learning , series =. 2021 , publisher =

work page 2021
[18]

Rabin, M. O. and Scott, D. , title =. IBM Journal of Research and Development , volume =. 1959 , doi =

work page 1959
[19]

2005 , doi =

Hutter, Marcus , title =. 2005 , doi =

work page 2005
[20]

Proceedings of the 37th International Conference on Machine Learning , pages =

Karimireddy, Sai Praneeth and Kale, Satyen and Mohri, Mehryar and Reddi, Sashank and Stich, Sebastian and Suresh, Ananda Theertha , title =. Proceedings of the 37th International Conference on Machine Learning , pages =

work page
[21]

International Conference on Learning Representations , year =

Morad, Steven and Kortvelesy, Ryan and Bettini, Matteo and Liwicki, Stephan and Prorok, Amanda , title =. International Conference on Learning Representations , year =

work page
[22]

Durón-Reyes and Nicholas Johnson and Olga Castaner and Pier Luigi Sacco and Eoin Cotter and Lucia Melloni , keywords =

Agustin Ibanez and Nick Roth and Aaron Colverson and Christopher Bailey and Bruce Miller and Dafne E. Durón-Reyes and Nicholas Johnson and Olga Castaner and Pier Luigi Sacco and Eoin Cotter and Lucia Melloni , keywords =. Music as a scientific metaphor for mind and brain , journal =. 2026 , doi =

work page 2026
[23]

Artificial Life and Robotics , volume =

Takashi Ikegami , title =. Artificial Life and Robotics , volume =

work page
[24]

BioSystems , volume =

Chris Salzberg and Antony Antony and Hiroki Sayama , title =. BioSystems , volume =

work page
[25]

Complexity , volume =

Chris Salzberg and Hiroki Sayama , title =. Complexity , volume =

work page
[26]

1999 , isbn =

Juarrero, Alicia , title =. 1999 , isbn =. doi:10.7551/mitpress/2528.001.0001 , url =

work page doi:10.7551/mitpress/2528.001.0001 1999
[27]

The Quarterly Journal of Economics , year =

Acemoglu, Daron and Aghion, Philippe and Lelarge, Claire and Van Reenen, John and Zilibotti, Fabrizio , title =. The Quarterly Journal of Economics , year =

work page
[28]

and Converse, Benjamin A

Adams, Gabrielle S. and Converse, Benjamin A. and Hales, Andrew H. and Klotz, Leidy E. , title =. Nature , year =. doi:10.1038/s41586-021-03380-y , url =

work page doi:10.1038/s41586-021-03380-y
[29]

and Janz, Niklas and Brooks, Daniel R

Agosta, Salvatore J. and Janz, Niklas and Brooks, Daniel R. , title =. Zoologia (Curitiba) , year =. doi:10.1590/s1984-46702010000200001 , url =

work page doi:10.1590/s1984-46702010000200001
[30]

CoRR , year =

Samuel Allen Alexander , title =. CoRR , year =. 2002.10221 , bibsource =

work page arXiv 2002
[31]

Journal of Artificial General Intelligence , year =

Samuel Allen Alexander and Michael Castaneda and Kevin Compher and Oscar Martinez , title =. Journal of Artificial General Intelligence , year =

work page
[32]

Reviews of Modern Physics , year =

Almheiri, Ahmed and Hartman, Thomas and Maldacena, Juan and Shaghoulian, Edgar and Tajdini, Amirhossein , title =. Reviews of Modern Physics , year =. doi:10.1103/revmodphys.93.035002 , url =. 2006.06872 , archiveprefix =

work page doi:10.1103/revmodphys.93.035002 2006
[33]

Black Holes: Complementarity or Firewalls?

Almheiri, Ahmed and Marolf, Donald and Polchinski, Joseph and Sully, James , title =. Journal of High Energy Physics , year =. doi:10.1007/jhep02(2013)062 , url =. 1207.3123 , archiveprefix =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1007/jhep02(2013)062 2013
[34]

and Miller, Mark and Vervaeke, John , title =

Andersen, Brett P. and Miller, Mark and Vervaeke, John , title =. Phenomenology and the Cognitive Sciences , year =

work page
[35]

and Bothell, Daniel and Byrne, Michael D

Anderson, John R. and Bothell, Daniel and Byrne, Michael D. and Douglass, Scott and Lebiere, Christian and Qin, Yulin , title =. Psychological Review , year =

work page
[36]

2017 , publisher =

Anderson, Edward , title =. 2017 , publisher =

work page 2017
[37]

2023 , url =

Anthropic , title =. 2023 , url =

work page 2023
[38]

1979 , volume =

Aquinas, Thomas , title =. 1979 , volume =

work page 1979
[39]

1885 , publisher =

Aristotle , title =. 1885 , publisher =

work page
[40]

and Gigerenzer, Gerd and Jacobs, Perke , title =

Artinger, Florian M. and Gigerenzer, Gerd and Jacobs, Perke , title =. Journal of Economic Literature , year =. doi:10.1257/jel.20201396 , url =

work page doi:10.1257/jel.20201396
[41]

Scientific reports , year =

Artyukhin, Alexander B and Yim, Joshua J and Cheong Cheong, Mi and Avery, Leon , title =. Scientific reports , year =

work page
[42]

W. R. Ashby , title =. The Journal of General Psychology , year =. doi:10.1080/00221309.1947.9918144 , url =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1080/00221309.1947.9918144 1947
[43]

Ross , title =

Ashby, W. Ross , title =. 1957 , publisher =

work page 1957
[44]

Refined Second Law of Thermodynamics for Fast Random Processes , journal =

Aurell, Erik and Gawedzki, Krzysztof and Mej. Refined Second Law of Thermodynamics for Fast Random Processes , journal =. 2012 , month = may, volume =. doi:10.1007/s10955-012-0478-x , url =

work page doi:10.1007/s10955-012-0478-x 2012
[45]

, title =

Baars, Bernard J. , title =. 1988 , publisher =

work page 1988
[46]

1997 , publisher =

Baars, Bernard , title =. 1997 , publisher =

work page 1997
[47]

, editor =

Baker, A. , editor =. The. 2022 , publisher =

work page 2022
[48]

Nature Materials , year =

Ball, Philip , title =. Nature Materials , year =. doi:10.1038/s41563-023-01501-8 , url =

work page doi:10.1038/s41563-023-01501-8
[49]

2016 , publisher =

Barabási, Albert-László , title =. 2016 , publisher =

work page 2016
[50]

Shape dynamics , booktitle =

Barbour, Julian , editor =. Shape dynamics , booktitle =. 2012 , pages =

work page 2012
[51]

Barnsley , title =

Michael F. Barnsley , title =. 2012 , publisher =

work page 2012
[52]

Barron and Colin Klein , title =

Andrew B. Barron and Colin Klein , title =. PNAS , year =. doi:10.1073/pnas.1520084113 , url =

work page doi:10.1073/pnas.1520084113
[53]

Complexity , year =

Bar-Yam, Yaneer , title =. Complexity , year =

work page
[54]

, title =

Bedau, Mark A. , title =. Noûs , year =

work page
[55]

1995 , publisher =

Beer, Stafford , title =. 1995 , publisher =

work page 1995
[56]

, title =

Bekenstein, Jacob D. , title =. Physical Review D , year =. doi:10.1103/physrevd.7.2333 , url =

work page doi:10.1103/physrevd.7.2333
[57]

, title =

Bekenstein, Jacob D. , title =. Physical Review D , volume =. 1981 , doi =

work page 1981
[58]

Journal of developmental and behavioral pediatrics : JDBP , year =

Bell, Martha Ann and Deater-Deckard, Kirby , title =. Journal of developmental and behavioral pediatrics : JDBP , year =. doi:10.1097/dbp.0b013e3181131fc7 , url =

work page doi:10.1097/dbp.0b013e3181131fc7
[59]

and Gebru, Timnit and McMillan-Major, Angelina and Shmitchell, Shmargaret , title =

Bender, Emily M. and Gebru, Timnit and McMillan-Major, Angelina and Shmitchell, Shmargaret , title =. Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency , year =

work page 2021
[60]

IEEE Transactions on Pattern Analysis and Machine Intelligence , year =

Bengio, Yoshua and Courville, Aaron and Vincent, Pascal , title =. IEEE Transactions on Pattern Analysis and Machine Intelligence , year =

work page
[61]

International Conference on Learning Representations , year =

Yoshua Bengio and Tristan Deleu and Nasim Rahaman and Nan Rosemary Ke and Sebastien Lachapelle and Olexa Bilaniuk and Anirudh Goyal and Christopher Pal , title =. International Conference on Learning Representations , year =

work page
[62]

Artificial General Intelligence , year =

Bennett, Michael Timothy , title =. Artificial General Intelligence , year =

work page
[63]

Under Review , year =

Michael Timothy Bennett , title =. Under Review , year =

work page
[64]

16th International Conference on Artificial General Intelligence , pages =

Bennett, Michael Timothy , title =. 16th International Conference on Artificial General Intelligence , pages =. 2023 , publisher =. doi:10.1007/978-3-031-33469-6_5 , url =

work page doi:10.1007/978-3-031-33469-6_5 2023
[65]

16th International Conference on Artificial General Intelligence , year =

Bennett, Michael Timothy , title =. 16th International Conference on Artificial General Intelligence , year =. doi:10.1007/978-3-031-33469-6_6 , url =

work page doi:10.1007/978-3-031-33469-6_6
[66]

17th International Conference on Artificial General Intelligence , year =

Bennett, Michael Timothy , title =. 17th International Conference on Artificial General Intelligence , year =. doi:10.1007/978-3-031-65572-2_3 , url =

work page doi:10.1007/978-3-031-65572-2_3
[67]

17th International Conference on Artificial General Intelligence , year =

Bennett, Michael Timothy , title =. 17th International Conference on Artificial General Intelligence , year =. doi:10.1007/978-3-031-65572-2_2 , url =

work page doi:10.1007/978-3-031-65572-2_2
[68]

2024 , doi =

Bennett, Michael Timothy , title =. 2024 , doi =. 2405.02325 , archivePrefix =

work page arXiv 2024
[69]

Artificial General Intelligence , journal =

Bennett, Michael Timothy , title =. Artificial General Intelligence , journal =. 2025 , volume =. doi:10.1007/978-3-032-00686-8_4 , url =

work page doi:10.1007/978-3-032-00686-8_4 2025
[70]

Preprint , year =

Bennett, Michael Timothy , title =. Preprint , year =

work page
[71]

Artificial General Intelligence , journal =

Michael Timothy Bennett , title =. Artificial General Intelligence , journal =. 2025 , volume =. doi:10.1007/978-3-032-00686-8_5 , url =

work page doi:10.1007/978-3-032-00686-8_5 2025
[72]

IJCAI , year =

Bennett, Michael Timothy , title =. IJCAI , year =. doi:10.24963/ijcai.2025/1238 , url =

work page doi:10.24963/ijcai.2025/1238 2025
[73]

2025 , doi =

Michael Timothy Bennett , title =. 2025 , doi =

work page 2025
[74]

Forthcoming in Proceedings of the AAAI 2026 Spring Symposium on Machine Consciousness: Integrating Theory, Technology, and Philosophy , year =

Michael Timothy Bennett , title =. Forthcoming in Proceedings of the AAAI 2026 Spring Symposium on Machine Consciousness: Integrating Theory, Technology, and Philosophy , year =

work page 2026
[75]

Preprint , year =

Michael Timothy Bennett , title =. Preprint , year =

work page
[76]

2017 , eprint=

Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , author=. 2017 , eprint=

work page 2017
[77]

2026 , doi =

Michael Timothy Bennett , title =. 2026 , doi =

work page 2026
[78]

2026 , note =

Bennett, Michael Timothy and Suzuki, Keisuke , title =. 2026 , note =

work page 2026
[79]

and Kemp, Charles and Griffiths, Thomas L

Tenenbaum, Joshua B. and Kemp, Charles and Griffiths, Thomas L. and Goodman, Noah D. , title =. Science , year =

work page
[80]

2025 , doi =

Bennett, Michael Timothy , title =. 2025 , doi =

work page 2025

Showing first 80 references.