Low-dimensional topology of deep neural networks
Pith reviewed 2026-07-01 06:38 UTC · model grok-4.3
The pith
ResNets and transformers match in their ability to change linking numbers, both exceeding monotonic feedforward networks unless those use nonmonotonic activations.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
When the effect on linking numbers is used as the measure, the skip-connection mechanism in ResNets is equivalent in power to the attention mechanism in transformers; both are strictly stronger than feedforward networks that use monotonic activations, which themselves are stronger than invertible and flow-based models; however, a nonmonotonic activation lifts a feedforward network into the same expressivity class as ResNets and transformers. The same ordering persists after the construction is extended from dimension three to arbitrary higher dimensions.
What carries the argument
Linking number of curves in R^3, tracked as it is modified by each layer operation.
If this is right
- ResNets and transformers belong to the same topological expressivity class.
- Monotonic feedforward networks form a strictly weaker class.
- Invertible and flow-based models form the weakest class.
- Switching to a nonmonotonic activation moves a feedforward network into the top class.
- The hierarchy remains unchanged when the models are lifted to dimensions greater than three.
Where Pith is reading between the lines
- The same linking-number test could be applied to other architectural motifs such as normalization layers or gating to produce a finer ranking.
- Designers might deliberately engineer activations whose nonmonotonicity maximizes linking-number change within a given width budget.
- The three-dimensional restriction makes the argument visualizable and separates architecture effects from width effects that usually confound comparisons.
- If the proxy holds, one could pre-screen candidate architectures by simulating their action on a small set of model links before any training.
Load-bearing premise
The size of the change a layer produces in linking number serves as a faithful and architecture-specific proxy for overall expressivity.
What would settle it
An empirical demonstration that two architectures producing identical distributions of linking-number changes nevertheless differ reliably in their ability to represent functions whose decision boundaries require nontrivial topology.
Figures
read the original abstract
We study layered models, including feedforward networks, ResNets, and transformers, by limiting each layer to a width of $d = 3$, i.e., $\mathbb{R}^3$ as representation space. This allows us to track how a neural network changes low-dimensional topological invariants through its layers. Just about any topological structure may be simplified or even trivialized by simply increasing dimension; e.g., any knot is equivalent to an unknot in $\mathbb{R}^4$. By restricting to $\mathbb{R}^3$, we not only isolate the effects of activation and depth from that of width, we work in a space that lends itself to easy visualization. We focus on linking number here, deferring other invariants like link groups, Milnor's $\bar{\mu}$-invariants, knot types, ambient cobordisms, to a sequel. We provide full proofs and empirical experiments to justify the following insights: When measured by their power to effect changes in linking numbers, the layer-skipping feature in ResNets is as powerful as the attention mechanism in transformers; both ResNets and transformers are strictly more powerful than feedforward neural networks with monotonic activations, which are in turn more powerful than invertible and flow-based models; but replacing monotonic activation with a nonmonotonic one elevates a feedforward network into the same expressivity class as ResNets and transformers. These results suggest that low-dimensional topology can be a useful tool to guide designs of AI architectures. We also generalize our results from $d = 3$ to arbitrary $d > 3$.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript restricts neural network layers to width d=3 (representation space R^3) to track changes in the linking number invariant through successive layers. It claims this isolates architectural effects from width and yields a strict hierarchy of expressivity: ResNets (via skip connections) and transformers (via attention) are equivalent and strictly more powerful than feedforward networks with monotonic activations, which in turn exceed invertible/flow-based models; nonmonotonic activations elevate feedforward networks to the top class. Full proofs and empirical experiments are provided for the d=3 case, with a generalization asserted for d>3. The linking-number magnitude under layer operations is presented as the distinguishing proxy.
Significance. If the linking-number change indeed functions as a faithful, width-independent discriminator of expressivity, the work supplies a concrete topological tool for architecture analysis and design. The explicit provision of proofs together with experiments is a positive feature that allows direct inspection of the derivations.
major comments (2)
- [Abstract] Abstract and the central ranking claim: the assertion that magnitude of linking-number alteration under layer operations (skip connections, attention, activation choice) serves as a valid proxy for overall expressivity is load-bearing yet receives no independent validation against task performance, optimization behavior, or other invariants (e.g., fundamental group of the complement). The d=3 restriction and the subsequent generalization both rest on this unanchored proxy.
- [Generalization to d>3] Generalization paragraph: the extension of the d=3 ordering to arbitrary d>3 is stated without additional argument showing that the observed differences in linking-number change persist when the same operations are embedded in higher-dimensional representations or when width is increased while keeping the topological operations fixed.
minor comments (1)
- The abstract refers to 'full proofs' but does not indicate where the key lemmas on linking-number change under each architectural operation are located; a forward pointer would improve readability.
Simulated Author's Rebuttal
We thank the referee for their thoughtful review and for highlighting these important points regarding the scope of our claims and the generalization. We address each major comment below.
read point-by-point responses
-
Referee: [Abstract] Abstract and the central ranking claim: the assertion that magnitude of linking-number alteration under layer operations (skip connections, attention, activation choice) serves as a valid proxy for overall expressivity is load-bearing yet receives no independent validation against task performance, optimization behavior, or other invariants (e.g., fundamental group of the complement). The d=3 restriction and the subsequent generalization both rest on this unanchored proxy.
Authors: Our work specifically quantifies expressivity via the magnitude of changes to the linking number invariant under different layer operations, as explicitly stated ('When measured by their power to effect changes in linking numbers'). We do not assert or validate that this serves as a proxy for overall expressivity, task performance, or other invariants; such validation is beyond the scope of this topological study, which focuses on low-dimensional topology as a tool for architecture analysis. The d=3 restriction is motivated in the introduction to isolate architectural effects and facilitate visualization and computation of invariants. We will revise the abstract to make the scope of the ranking claim clearer and avoid any implication of broader proxy validity. revision: yes
-
Referee: [Generalization to d>3] Generalization paragraph: the extension of the d=3 ordering to arbitrary d>3 is stated without additional argument showing that the observed differences in linking-number change persist when the same operations are embedded in higher-dimensional representations or when width is increased while keeping the topological operations fixed.
Authors: We agree that the generalization requires more explicit justification. In dimensions d > 3, the linking number can be computed within any 3-dimensional subspace, and the neural network layers can be designed to act nontrivially only on such a subspace while being the identity elsewhere. This embedding preserves the relative power of the operations (e.g., skip connections allowing independent modification of linking numbers) as in the d=3 case. We will add a detailed paragraph in the generalization section providing this argument and noting that the ordering holds under such embeddings. revision: yes
Circularity Check
No circularity: claims rest on explicit proofs of linking-number changes
full rationale
The paper tracks linking numbers under explicit layer operations (skip connections, attention, monotonic vs non-monotonic activations) in R^3 via direct mathematical proofs and experiments. These derivations start from the definitions of the architectures and the topological invariant itself; no quantity is fitted to data and then renamed a prediction, no self-citation supplies a load-bearing uniqueness theorem, and the d=3 to d>3 generalization is stated as a straightforward extension without redefinition. The central ordering of expressivity therefore remains independent of its own inputs.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Changes in linking number quantify relative expressivity of network layers in R^3
Reference graph
Works this paper leans on
-
[1]
International Conference on Learning Representations (ICLR) , year =
Park, Sejun and Yun, Chulhee and Lee, Jaeho and Shin, Jinwoo , title =. International Conference on Learning Representations (ICLR) , year =
-
[2]
International Conference on Learning Representations (ICLR) , year =
Cai, Yongqiang , title =. International Conference on Learning Representations (ICLR) , year =
-
[3]
Proceedings of the 40th International Conference on Machine Learning , series =
Li, Li'Ang and Duan, Yifei and Ji, Guanghua and Cai, Yongqiang , title =. Proceedings of the 40th International Conference on Machine Learning , series =. 2023 , publisher =
2023
-
[4]
International Conference on Learning Representations (ICLR) , year =
Kim, Namjun and Min, Chanho and Park, Sejun , title =. International Conference on Learning Representations (ICLR) , year =
-
[5]
, title =
Palais, Richard S. , title =. Comment. Math. Helv. , volume =. 1960 , pages =
1960
-
[6]
Hudson, J. F. P. and Zeeman, E. C. , title =. Inst. Hautes \'Etudes Sci. Publ. Math. , volume =. 1964 , pages =
1964
-
[7]
1976 , isbn =
Rolfsen, Dale , title =. 1976 , isbn =
1976
-
[8]
2002 , isbn =
Hatcher, Allen , title =. 2002 , isbn =
2002
-
[9]
, title =
Munkres, James R. , title =. 2000 , isbn =
2000
-
[10]
2014 , howpublished =
Olah, Christopher , title =. 2014 , howpublished =
2014
-
[11]
Naitzat, Gregory and Zhitnikov, Andrey and Lim, Lek-Heng , title =. J. Mach. Learn. Res. , volume =. 2020 , number =
2020
-
[12]
Carlsson, Gunnar , title =. Bull. Amer. Math. Soc. (N.S.) , volume =. 2009 , number =
2009
-
[13]
, title =
Edelsbrunner, Herbert and Harer, John L. , title =. 2010 , isbn =
2010
-
[14]
PLOS Computational Biology , volume =
Cang, Zixuan and Wei, Guo-Wei , title =. PLOS Computational Biology , volume =. 2017 , doi =
2017
-
[15]
Advances in Neural Information Processing Systems (NeurIPS) , volume =
Esmaeili, Babak and Walters, Robin and Zimmermann, Heiko and van de Meent, Jan-Willem , title =. Advances in Neural Information Processing Systems (NeurIPS) , volume =. 2023 , note =
2023
-
[16]
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , year =
He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian , title =. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , year =
-
[17]
and Kaiser,
Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N. and Kaiser,. Attention is all you need , booktitle =. 2017 , pages =
2017
-
[18]
, title =
Cover, Thomas M. , title =. IEEE Transactions on Electronic Computers , volume =. 1965 , pages =
1965
-
[19]
Machine Learning , volume =
Cortes, Corinna and Vapnik, Vladimir , title =. Machine Learning , volume =. 1995 , pages =
1995
-
[20]
Cybenko, George , title =. Math. Control Signals Systems , volume =. 1989 , number =
1989
-
[21]
Neural Networks , volume =
Hornik, Kurt and Stinchcombe, Maxwell and White, Halbert , title =. Neural Networks , volume =. 1989 , number =
1989
-
[22]
, title =
Barron, Andrew R. , title =. IEEE Trans. Inform. Theory , volume =. 1993 , pages =
1993
-
[23]
Conference on Learning Theory (COLT) , series =
Telgarsky, Matus , title =. Conference on Learning Theory (COLT) , series =. 2016 , pages =
2016
-
[24]
Conference on Learning Theory (COLT) , series =
Eldan, Ronen and Shamir, Ohad , title =. Conference on Learning Theory (COLT) , series =. 2016 , pages =
2016
-
[25]
Neural Networks , volume =
Yarotsky, Dmitry , title =. Neural Networks , volume =. 2017 , pages =
2017
-
[26]
and Harvey, Nick and Liaw, Christopher and Mehrabian, Abbas , title =
Bartlett, Peter L. and Harvey, Nick and Liaw, Christopher and Mehrabian, Abbas , title =. J. Mach. Learn. Res. , volume =. 2019 , pages =
2019
-
[27]
2017 , note =
Hanin, Boris and Sellke, Mark , title =. 2017 , note =
2017
-
[28]
International Conference on Learning Representations (ICLR) , year =
Johnson, Jesse , title =. International Conference on Learning Representations (ICLR) , year =
-
[29]
2024 , note =
Rochau, Dennis and Chan, Robin and Gottschalk, Hanno , title =. 2024 , note =
2024
-
[30]
Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics (AISTATS) , series =
Glorot, Xavier and Bordes, Antoine and Bengio, Yoshua , title =. Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics (AISTATS) , series =. 2011 , pages =
2011
-
[31]
and Hannun, Awni Y
Maas, Andrew L. and Hannun, Awni Y. and Ng, Andrew Y. , title =. Proceedings of the ICML 2013 Workshop on Deep Learning for Audio, Speech and Language Processing , year =
2013
-
[32]
International Conference on Learning Representations (ICLR) , year =
Clevert, Djork-Arn\'e and Unterthiner, Thomas and Hochreiter, Sepp , title =. International Conference on Learning Representations (ICLR) , year =
-
[33]
Gaussian Error Linear Units (GELUs)
Hendrycks, Dan and Gimpel, Kevin , title =. arXiv:1606.08415 , year =
work page internal anchor Pith review Pith/arXiv arXiv
-
[34]
Searching for Activation Functions
Ramachandran, Prajit and Zoph, Barret and Le, Quoc V. , title =. arXiv:1710.05941 , year =
work page internal anchor Pith review Pith/arXiv arXiv
-
[35]
British Machine Vision Conference (BMVC) , year =
Misra, Diganta , title =. British Machine Vision Conference (BMVC) , year =
-
[36]
Elfwing, Stefan and Uchibe, Eiji and Doya, Kenji , title =. Neural Networks , volume =. 2018 , pages =. doi:10.1016/j.neunet.2017.12.012 , note =
-
[37]
International Conference on Learning Representations (ICLR) , year =
Dinh, Laurent and Sohl-Dickstein, Jascha and Bengio, Samy , title =. International Conference on Learning Representations (ICLR) , year =
-
[38]
and Dhariwal, Prafulla , title =
Kingma, Diederik P. and Dhariwal, Prafulla , title =. Advances in Neural Information Processing Systems , volume =. 2018 , pages =
2018
-
[39]
Grathwohl, Will and Chen, Ricky T. Q. and Bettencourt, Jesse and Sutskever, Ilya and Duvenaud, David , title =. International Conference on Learning Representations (ICLR) , year =
-
[40]
Lipman, Yaron and Chen, Ricky T. Q. and Ben-Hamu, Heli and Nickel, Maximilian and Le, Matt , title =. International Conference on Learning Representations (ICLR) , year =
-
[41]
and Ren, Mengye and Urtasun, Raquel and Grosse, Roger B
Gomez, Aidan N. and Ren, Mengye and Urtasun, Raquel and Grosse, Roger B. , title =. Advances in Neural Information Processing Systems , volume =. 2017 , pages =
2017
-
[42]
International Conference on Learning Representations (ICLR) , year =
Jacobsen, J\"orn-Henrik and Smeulders, Arnold and Oyallon, Edouard , title =. International Conference on Learning Representations (ICLR) , year =
-
[43]
Zico and He, Kaiming , title =
Geng, Zhengyang and Deng, Mingyang and Bai, Xingjian and Kolter, J. Zico and He, Kaiming , title =. 2025 , note =
2025
-
[44]
Chen, Ricky T. Q. and Rubanova, Yulia and Bettencourt, Jesse and Duvenaud, David , title =. Advances in Neural Information Processing Systems , volume =. 2018 , pages =
2018
-
[45]
Advances in Neural Information Processing Systems , volume =
Dupont, Emilien and Doucet, Arnaud and Teh, Yee Whye , title =. Advances in Neural Information Processing Systems , volume =. 2019 , note =
2019
-
[46]
Qu, Ante and James, Doug L. , title =. ACM Trans. Graph. , volume =. 2021 , pages =. doi:10.1145/3450626.3459778 , note =
-
[47]
IEEE Trans
Bengio, Yoshua and Courville, Aaron and Vincent, Pascal , title =. IEEE Trans. Pattern Anal. Mach. Intell. , volume =. 2013 , number =
2013
-
[48]
and Welling, Max , title =
Kingma, Diederik P. and Welling, Max , title =. International Conference on Learning Representations (ICLR) , year =
-
[49]
Lott, John , TITLE =. International. 2007 , ISBN =. doi:10.4171/022-1/2 , URL =
-
[50]
Proceedings of the
Milnor, John , TITLE =. Proceedings of the. 1987 , ISBN =
1987
-
[51]
Wall, C. T. C. , TITLE =. Proceedings of the. 1984 , ISBN =
1984
-
[52]
Proceedings of the
Atiyah, Michael , TITLE =. Proceedings of the. 1987 , ISBN =
1987
-
[53]
, TITLE =
Birman, Joan S. , TITLE =. Proceedings of the. 1991 , ISBN =
1991
-
[54]
Highly accurate protein structure prediction with
Jumper, John and Evans, Richard and Pritzel, Alexander and Green, Tim and Figurnov, Michael and Ronneberger, Olaf and Tunyasuvunakool, Kathryn and Bates, Russ and. Highly accurate protein structure prediction with. Nature , volume =. 2021 , doi =
2021
-
[55]
Nature , volume =
Abramson, Josh and Adler, Jonas and Dunger, Jack and Evans, Richard and Green, Tim and Pritzel, Alexander and Ronneberger, Olaf and Willmore, Lindsay and Ballard, Andrew J and Bambrick, Joshua and others , title =. Nature , volume =. 2024 , doi =
2024
-
[56]
utt, Kristof T. and Kindermans, Pieter-Jan and Sauceda, Huziel E. and Chmiela, Stefan and Tkatchenko, Alexandre and M\
Sch\"utt, Kristof T. and Kindermans, Pieter-Jan and Sauceda, Huziel E. and Chmiela, Stefan and Tkatchenko, Alexandre and M\"uller, Klaus-Robert , title =. Advances in Neural Information Processing Systems , volume =. 2017 , pages =
2017
-
[57]
International Conference on Machine Learning (ICML) , series =
Satorras, V\'. International Conference on Machine Learning (ICML) , series =
-
[58]
International Conference on Learning Representations (ICLR) , year =
Gasteiger, Johannes and Gro , Janek and G\"unnemann, Stephan , title =. International Conference on Learning Representations (ICLR) , year =
-
[59]
and Maron, Haggai , title =
Eitan, Yam and Gelberg, Yoav and Bar-Shalom, Guy and Frasca, Fabrizio and Bronstein, Michael M. and Maron, Haggai , title =. International Conference on Learning Representations (ICLR) , year =
-
[60]
International Conference on Learning Representations (ICLR) , year =
Dumitrescu, Alexandru and Korpela, Dani and Heinonen, Markus and Verma, Yogesh and Iakovlev, Valerii and Garg, Vikas and L\"ahdesmäki, Harri , title =. International Conference on Learning Representations (ICLR) , year =
-
[61]
Machine Learning and Knowledge Discovery in Databases (ECML PKDD) , pages =
Gai\'. Machine Learning and Knowledge Discovery in Databases (ECML PKDD) , pages =. 2023 , publisher =. doi:10.1007/978-3-031-43418-1_3 , note =
-
[62]
Knots and -Curves Identification in Polymeric Chains and Native Proteins Using Neural Networks , journal =
da Silva, Fernando Bruno and Gabrov. Knots and -Curves Identification in Polymeric Chains and Native Proteins Using Neural Networks , journal =. 2024 , doi =
2024
-
[63]
Briefings in Bioinformatics , volume =
Han, Bingqing and Zhang, Yipeng and Li, Longlong and Gong, Xinqi and Xia, Kelin , title =. Briefings in Bioinformatics , volume =. 2025 , doi =
2025
-
[64]
Journal of Chemical Information and Modeling , year =
Wee, Junjie and Jiang, Jian , title =. Journal of Chemical Information and Modeling , year =
-
[65]
International Joint Conference on Artificial Intelligence (IJCAI) , pages =
Zha, Jirong and Fan, Yuxuan and Yang, Xiao and Gao, Chen and Chen, Xinlei , title =. International Joint Conference on Artificial Intelligence (IJCAI) , pages =. 2025 , doi =
2025
-
[66]
2022 , howpublished =
LeCun, Yann , title =. 2022 , howpublished =
2022
-
[67]
2026 , note =
Ren, Junyu and Lim, Lek-Heng , title =. 2026 , note =
2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.