Low-dimensional topology of deep neural networks

Junyu Ren; Lek-Heng Lim

arxiv: 2606.31856 · v1 · pith:JDFWXM5Bnew · submitted 2026-06-30 · 💻 cs.LG · math.GT

Low-dimensional topology of deep neural networks

Junyu Ren , Lek-Heng Lim This is my paper

Pith reviewed 2026-07-01 06:38 UTC · model grok-4.3

classification 💻 cs.LG math.GT

keywords linking numberneural network expressivityResNettransformerfeedforward networkmonotonic activationlow-dimensional topologyskip connection

0 comments

The pith

ResNets and transformers match in their ability to change linking numbers, both exceeding monotonic feedforward networks unless those use nonmonotonic activations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper restricts all models to three-dimensional representations so that topological invariants like linking numbers can be tracked layer by layer without being trivialized by extra dimensions. It measures how much each architecture type can alter these invariants and derives a strict hierarchy: invertible and flow models change them least, monotonic feedforward networks change them more, and both ResNets (via skip connections) and transformers (via attention) change them most. Replacing the monotonic activation in a plain feedforward net with a nonmonotonic one raises its power to the same level as ResNets and transformers. A reader cares because the ranking is obtained geometrically rather than from task accuracy, isolating the contribution of architectural features such as skips and attention.

Core claim

When the effect on linking numbers is used as the measure, the skip-connection mechanism in ResNets is equivalent in power to the attention mechanism in transformers; both are strictly stronger than feedforward networks that use monotonic activations, which themselves are stronger than invertible and flow-based models; however, a nonmonotonic activation lifts a feedforward network into the same expressivity class as ResNets and transformers. The same ordering persists after the construction is extended from dimension three to arbitrary higher dimensions.

What carries the argument

Linking number of curves in R^3, tracked as it is modified by each layer operation.

If this is right

ResNets and transformers belong to the same topological expressivity class.
Monotonic feedforward networks form a strictly weaker class.
Invertible and flow-based models form the weakest class.
Switching to a nonmonotonic activation moves a feedforward network into the top class.
The hierarchy remains unchanged when the models are lifted to dimensions greater than three.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same linking-number test could be applied to other architectural motifs such as normalization layers or gating to produce a finer ranking.
Designers might deliberately engineer activations whose nonmonotonicity maximizes linking-number change within a given width budget.
The three-dimensional restriction makes the argument visualizable and separates architecture effects from width effects that usually confound comparisons.
If the proxy holds, one could pre-screen candidate architectures by simulating their action on a small set of model links before any training.

Load-bearing premise

The size of the change a layer produces in linking number serves as a faithful and architecture-specific proxy for overall expressivity.

What would settle it

An empirical demonstration that two architectures producing identical distributions of linking-number changes nevertheless differ reliably in their ability to represent functions whose decision boundaries require nontrivial topology.

Figures

Figures reproduced from arXiv: 2606.31856 by Junyu Ren, Lek-Heng Lim.

**Figure 1.** Figure 1: Linked supports create a topological obstruction to classification: linked class manifolds cannot be contained in disjoint convex decision regions, while hyperplane-separated classes are necessarily unlinked. tioners have observed that certain architectural features like skip connections, nonmonotonic activations, attention, etc, consistently outperform alternatives. Engineering reasons have been proffered… view at source ↗

**Figure 3.** Figure 3: Monotonic (top) vs nonmonotonic (bottom) activations. shows the obstruction is fundamental to monotonicity, not an artifact of ReLU’s piecewise-linear form. The key to generalizing beyond ReLU is the concept of link homotopy: a simultaneous continuous deformation of multiple components that keeps them disjoint throughout. Unlike single-component homotopy, where one set moves while others stay fixed, link … view at source ↗

**Figure 4.** Figure 4: Hopf link unlinking via |x| activations (full resolution [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 6.** Figure 6: ResNet skip connection implementing |x| = x + 2 ReLU(−x) on linked disk-annulus (pt-S 1 link). (a) Input x. (b) Residual branch f(x) ≈ 2 ReLU(−x). (c) Output x + f(x): folding separates components [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 7.** Figure 7: Linked cycles in CIFAR-10: bird (blue) and deer (red) with link = −1 at ε = 0.034. reach the expressivity ceiling (ReLU+Skip attains 100% at k = 1, 98.7% at k = 2). As k increases, the best nonmonotonic architectures (GELU/Swish) hold a 3–4pp accuracy advantage over the best monotonic architecture at k ≥ 10, consistent with nonmonotonic activations being able to resolve each local entanglement separately… view at source ↗

**Figure 8.** Figure 8: Kernel translation argument. If a rank-deficient affine map with v in its kernel does not create intersections, we can slide components along v (unchanged by f) to achieve linear separation, then contract each to a point, forcing link = 0. Proof. Suppose not. Write f(x) = Ax + b with rank(A) < d and pick a unit vector v ∈ ker(A). Since M, N are compact, choose L large enough that M and N + Lv lie in disjoi… view at source ↗

**Figure 9.** Figure 9: Hopf link unlinking via absolute value activations. F.3. ResNet absolute-value synthesis Proof of Theorem 5.2 (ResNet topological expressivity). The single identity |x| = x + 2 ReLU(−x) realizes coordinate-wise absolute value as one ResNet block: take residual branch G(x) = 2 ReLU(−x) (a width-d ReLU sublayer with input weight −Id and output weight 2Id), then F(x) = x + G(x) = |x|. Translated folds x 7→ c … view at source ↗

**Figure 10.** Figure 10: shows the full resolution visualization of the ResNet skip connection mechanism on the disk-annulus (S 0 -S 1 ) separation task. This experiment demonstrates that a depth-3 width-2 ResNet with ReLU activations learns to implement the folding operation |x| = x + 2 ReLU(−x) predicted by Theorem 5.2 [PITH_FULL_IMAGE:figures/full_fig_p020_10.png] view at source ↗

**Figure 11.** Figure 11: Linked cycles detected in CIFAR-10 PCA-3D: bird (blue, 58 points) and deer (red, 40 points) interlock with link = −1 at ε = 0.034. Linking-consistency definition. For a class pair (Xi , Xj ) and N independent runs (each regenerating the augmented dataset with a fresh random seed and recomputing graphs/cycles), the linking consistency is consistency(Xi , Xj ) = 1 N P r 1[| linkr(Xi , Xj )| ≥ 1]. Variation … view at source ↗

read the original abstract

We study layered models, including feedforward networks, ResNets, and transformers, by limiting each layer to a width of $d = 3$, i.e., $\mathbb{R}^3$ as representation space. This allows us to track how a neural network changes low-dimensional topological invariants through its layers. Just about any topological structure may be simplified or even trivialized by simply increasing dimension; e.g., any knot is equivalent to an unknot in $\mathbb{R}^4$. By restricting to $\mathbb{R}^3$, we not only isolate the effects of activation and depth from that of width, we work in a space that lends itself to easy visualization. We focus on linking number here, deferring other invariants like link groups, Milnor's $\bar{\mu}$-invariants, knot types, ambient cobordisms, to a sequel. We provide full proofs and empirical experiments to justify the following insights: When measured by their power to effect changes in linking numbers, the layer-skipping feature in ResNets is as powerful as the attention mechanism in transformers; both ResNets and transformers are strictly more powerful than feedforward neural networks with monotonic activations, which are in turn more powerful than invertible and flow-based models; but replacing monotonic activation with a nonmonotonic one elevates a feedforward network into the same expressivity class as ResNets and transformers. These results suggest that low-dimensional topology can be a useful tool to guide designs of AI architectures. We also generalize our results from $d = 3$ to arbitrary $d > 3$.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Linking numbers in d=3 give a clean way to rank skip connections and attention above monotonic feedforwards, but the proxy still needs external checks against actual approximation power.

read the letter

The paper's core move is to fix width at d=3 so that linking numbers become a usable invariant for comparing how different layers alter topology. They prove that skip connections and attention produce larger changes than monotonic activations in feedforwards, that non-monotonic activations close that gap, and that invertible models sit at the bottom; they also lift the statements to d>3.

What stands out is the deliberate restriction to low dimension. It lets them track concrete changes without width confounding the picture, and they supply both proofs and some empirical checks. That is a genuine technical step beyond the usual approximation or VC-dimension arguments.

The soft spot is the leap from linking-number magnitude to expressivity ranking. The ordering is derived cleanly from the operations, but the paper does not show that these differences survive when the same layers are placed in wider networks or that they predict measurable gaps in function approximation or optimization on tasks. Other invariants are deferred, so it is not yet clear how stable the ranking is.

This is for people already working on topological or geometric views of networks. A reader who wants a new invariant to play with will find concrete material; someone looking for direct design guidance will still need the correlation with performance to be demonstrated.

It deserves a serious referee. The proofs are there to check and the experiments can be replicated or extended.

Referee Report

2 major / 1 minor

Summary. The manuscript restricts neural network layers to width d=3 (representation space R^3) to track changes in the linking number invariant through successive layers. It claims this isolates architectural effects from width and yields a strict hierarchy of expressivity: ResNets (via skip connections) and transformers (via attention) are equivalent and strictly more powerful than feedforward networks with monotonic activations, which in turn exceed invertible/flow-based models; nonmonotonic activations elevate feedforward networks to the top class. Full proofs and empirical experiments are provided for the d=3 case, with a generalization asserted for d>3. The linking-number magnitude under layer operations is presented as the distinguishing proxy.

Significance. If the linking-number change indeed functions as a faithful, width-independent discriminator of expressivity, the work supplies a concrete topological tool for architecture analysis and design. The explicit provision of proofs together with experiments is a positive feature that allows direct inspection of the derivations.

major comments (2)

[Abstract] Abstract and the central ranking claim: the assertion that magnitude of linking-number alteration under layer operations (skip connections, attention, activation choice) serves as a valid proxy for overall expressivity is load-bearing yet receives no independent validation against task performance, optimization behavior, or other invariants (e.g., fundamental group of the complement). The d=3 restriction and the subsequent generalization both rest on this unanchored proxy.
[Generalization to d>3] Generalization paragraph: the extension of the d=3 ordering to arbitrary d>3 is stated without additional argument showing that the observed differences in linking-number change persist when the same operations are embedded in higher-dimensional representations or when width is increased while keeping the topological operations fixed.

minor comments (1)

The abstract refers to 'full proofs' but does not indicate where the key lemmas on linking-number change under each architectural operation are located; a forward pointer would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful review and for highlighting these important points regarding the scope of our claims and the generalization. We address each major comment below.

read point-by-point responses

Referee: [Abstract] Abstract and the central ranking claim: the assertion that magnitude of linking-number alteration under layer operations (skip connections, attention, activation choice) serves as a valid proxy for overall expressivity is load-bearing yet receives no independent validation against task performance, optimization behavior, or other invariants (e.g., fundamental group of the complement). The d=3 restriction and the subsequent generalization both rest on this unanchored proxy.

Authors: Our work specifically quantifies expressivity via the magnitude of changes to the linking number invariant under different layer operations, as explicitly stated ('When measured by their power to effect changes in linking numbers'). We do not assert or validate that this serves as a proxy for overall expressivity, task performance, or other invariants; such validation is beyond the scope of this topological study, which focuses on low-dimensional topology as a tool for architecture analysis. The d=3 restriction is motivated in the introduction to isolate architectural effects and facilitate visualization and computation of invariants. We will revise the abstract to make the scope of the ranking claim clearer and avoid any implication of broader proxy validity. revision: yes
Referee: [Generalization to d>3] Generalization paragraph: the extension of the d=3 ordering to arbitrary d>3 is stated without additional argument showing that the observed differences in linking-number change persist when the same operations are embedded in higher-dimensional representations or when width is increased while keeping the topological operations fixed.

Authors: We agree that the generalization requires more explicit justification. In dimensions d > 3, the linking number can be computed within any 3-dimensional subspace, and the neural network layers can be designed to act nontrivially only on such a subspace while being the identity elsewhere. This embedding preserves the relative power of the operations (e.g., skip connections allowing independent modification of linking numbers) as in the d=3 case. We will add a detailed paragraph in the generalization section providing this argument and noting that the ordering holds under such embeddings. revision: yes

Circularity Check

0 steps flagged

No circularity: claims rest on explicit proofs of linking-number changes

full rationale

The paper tracks linking numbers under explicit layer operations (skip connections, attention, monotonic vs non-monotonic activations) in R^3 via direct mathematical proofs and experiments. These derivations start from the definitions of the architectures and the topological invariant itself; no quantity is fitted to data and then renamed a prediction, no self-citation supplies a load-bearing uniqueness theorem, and the d=3 to d>3 generalization is stated as a straightforward extension without redefinition. The central ordering of expressivity therefore remains independent of its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review performed on abstract alone; no explicit free parameters, invented entities, or detailed axioms are extractable.

axioms (1)

domain assumption Changes in linking number quantify relative expressivity of network layers in R^3
Central measurement used to rank architectures

pith-pipeline@v0.9.1-grok · 5811 in / 1276 out tokens · 40467 ms · 2026-07-01T06:38:55.719009+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

67 extracted references · 6 canonical work pages · 2 internal anchors

[1]

International Conference on Learning Representations (ICLR) , year =

Park, Sejun and Yun, Chulhee and Lee, Jaeho and Shin, Jinwoo , title =. International Conference on Learning Representations (ICLR) , year =
[2]

International Conference on Learning Representations (ICLR) , year =

Cai, Yongqiang , title =. International Conference on Learning Representations (ICLR) , year =
[3]

Proceedings of the 40th International Conference on Machine Learning , series =

Li, Li'Ang and Duan, Yifei and Ji, Guanghua and Cai, Yongqiang , title =. Proceedings of the 40th International Conference on Machine Learning , series =. 2023 , publisher =

2023
[4]

International Conference on Learning Representations (ICLR) , year =

Kim, Namjun and Min, Chanho and Park, Sejun , title =. International Conference on Learning Representations (ICLR) , year =
[5]

, title =

Palais, Richard S. , title =. Comment. Math. Helv. , volume =. 1960 , pages =

1960
[6]

Hudson, J. F. P. and Zeeman, E. C. , title =. Inst. Hautes \'Etudes Sci. Publ. Math. , volume =. 1964 , pages =

1964
[7]

1976 , isbn =

Rolfsen, Dale , title =. 1976 , isbn =

1976
[8]

2002 , isbn =

Hatcher, Allen , title =. 2002 , isbn =

2002
[9]

, title =

Munkres, James R. , title =. 2000 , isbn =

2000
[10]

2014 , howpublished =

Olah, Christopher , title =. 2014 , howpublished =

2014
[11]

Naitzat, Gregory and Zhitnikov, Andrey and Lim, Lek-Heng , title =. J. Mach. Learn. Res. , volume =. 2020 , number =

2020
[12]

Carlsson, Gunnar , title =. Bull. Amer. Math. Soc. (N.S.) , volume =. 2009 , number =

2009
[13]

, title =

Edelsbrunner, Herbert and Harer, John L. , title =. 2010 , isbn =

2010
[14]

PLOS Computational Biology , volume =

Cang, Zixuan and Wei, Guo-Wei , title =. PLOS Computational Biology , volume =. 2017 , doi =

2017
[15]

Advances in Neural Information Processing Systems (NeurIPS) , volume =

Esmaeili, Babak and Walters, Robin and Zimmermann, Heiko and van de Meent, Jan-Willem , title =. Advances in Neural Information Processing Systems (NeurIPS) , volume =. 2023 , note =

2023
[16]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , year =

He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian , title =. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , year =
[17]

and Kaiser,

Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N. and Kaiser,. Attention is all you need , booktitle =. 2017 , pages =

2017
[18]

, title =

Cover, Thomas M. , title =. IEEE Transactions on Electronic Computers , volume =. 1965 , pages =

1965
[19]

Machine Learning , volume =

Cortes, Corinna and Vapnik, Vladimir , title =. Machine Learning , volume =. 1995 , pages =

1995
[20]

Cybenko, George , title =. Math. Control Signals Systems , volume =. 1989 , number =

1989
[21]

Neural Networks , volume =

Hornik, Kurt and Stinchcombe, Maxwell and White, Halbert , title =. Neural Networks , volume =. 1989 , number =

1989
[22]

, title =

Barron, Andrew R. , title =. IEEE Trans. Inform. Theory , volume =. 1993 , pages =

1993
[23]

Conference on Learning Theory (COLT) , series =

Telgarsky, Matus , title =. Conference on Learning Theory (COLT) , series =. 2016 , pages =

2016
[24]

Conference on Learning Theory (COLT) , series =

Eldan, Ronen and Shamir, Ohad , title =. Conference on Learning Theory (COLT) , series =. 2016 , pages =

2016
[25]

Neural Networks , volume =

Yarotsky, Dmitry , title =. Neural Networks , volume =. 2017 , pages =

2017
[26]

and Harvey, Nick and Liaw, Christopher and Mehrabian, Abbas , title =

Bartlett, Peter L. and Harvey, Nick and Liaw, Christopher and Mehrabian, Abbas , title =. J. Mach. Learn. Res. , volume =. 2019 , pages =

2019
[27]

2017 , note =

Hanin, Boris and Sellke, Mark , title =. 2017 , note =

2017
[28]

International Conference on Learning Representations (ICLR) , year =

Johnson, Jesse , title =. International Conference on Learning Representations (ICLR) , year =
[29]

2024 , note =

Rochau, Dennis and Chan, Robin and Gottschalk, Hanno , title =. 2024 , note =

2024
[30]

Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics (AISTATS) , series =

Glorot, Xavier and Bordes, Antoine and Bengio, Yoshua , title =. Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics (AISTATS) , series =. 2011 , pages =

2011
[31]

and Hannun, Awni Y

Maas, Andrew L. and Hannun, Awni Y. and Ng, Andrew Y. , title =. Proceedings of the ICML 2013 Workshop on Deep Learning for Audio, Speech and Language Processing , year =

2013
[32]

International Conference on Learning Representations (ICLR) , year =

Clevert, Djork-Arn\'e and Unterthiner, Thomas and Hochreiter, Sepp , title =. International Conference on Learning Representations (ICLR) , year =
[33]

Gaussian Error Linear Units (GELUs)

Hendrycks, Dan and Gimpel, Kevin , title =. arXiv:1606.08415 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[34]

Searching for Activation Functions

Ramachandran, Prajit and Zoph, Barret and Le, Quoc V. , title =. arXiv:1710.05941 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[35]

British Machine Vision Conference (BMVC) , year =

Misra, Diganta , title =. British Machine Vision Conference (BMVC) , year =
[36]

Neural Networks , volume =

Elfwing, Stefan and Uchibe, Eiji and Doya, Kenji , title =. Neural Networks , volume =. 2018 , pages =. doi:10.1016/j.neunet.2017.12.012 , note =

work page doi:10.1016/j.neunet.2017.12.012 2018
[37]

International Conference on Learning Representations (ICLR) , year =

Dinh, Laurent and Sohl-Dickstein, Jascha and Bengio, Samy , title =. International Conference on Learning Representations (ICLR) , year =
[38]

and Dhariwal, Prafulla , title =

Kingma, Diederik P. and Dhariwal, Prafulla , title =. Advances in Neural Information Processing Systems , volume =. 2018 , pages =

2018
[39]

Grathwohl, Will and Chen, Ricky T. Q. and Bettencourt, Jesse and Sutskever, Ilya and Duvenaud, David , title =. International Conference on Learning Representations (ICLR) , year =
[40]

Lipman, Yaron and Chen, Ricky T. Q. and Ben-Hamu, Heli and Nickel, Maximilian and Le, Matt , title =. International Conference on Learning Representations (ICLR) , year =
[41]

and Ren, Mengye and Urtasun, Raquel and Grosse, Roger B

Gomez, Aidan N. and Ren, Mengye and Urtasun, Raquel and Grosse, Roger B. , title =. Advances in Neural Information Processing Systems , volume =. 2017 , pages =

2017
[42]

International Conference on Learning Representations (ICLR) , year =

Jacobsen, J\"orn-Henrik and Smeulders, Arnold and Oyallon, Edouard , title =. International Conference on Learning Representations (ICLR) , year =
[43]

Zico and He, Kaiming , title =

Geng, Zhengyang and Deng, Mingyang and Bai, Xingjian and Kolter, J. Zico and He, Kaiming , title =. 2025 , note =

2025
[44]

Chen, Ricky T. Q. and Rubanova, Yulia and Bettencourt, Jesse and Duvenaud, David , title =. Advances in Neural Information Processing Systems , volume =. 2018 , pages =

2018
[45]

Advances in Neural Information Processing Systems , volume =

Dupont, Emilien and Doucet, Arnaud and Teh, Yee Whye , title =. Advances in Neural Information Processing Systems , volume =. 2019 , note =

2019
[46]

, title =

Qu, Ante and James, Doug L. , title =. ACM Trans. Graph. , volume =. 2021 , pages =. doi:10.1145/3450626.3459778 , note =

work page doi:10.1145/3450626.3459778 2021
[47]

IEEE Trans

Bengio, Yoshua and Courville, Aaron and Vincent, Pascal , title =. IEEE Trans. Pattern Anal. Mach. Intell. , volume =. 2013 , number =

2013
[48]

and Welling, Max , title =

Kingma, Diederik P. and Welling, Max , title =. International Conference on Learning Representations (ICLR) , year =
[49]

International

Lott, John , TITLE =. International. 2007 , ISBN =. doi:10.4171/022-1/2 , URL =

work page doi:10.4171/022-1/2 2007
[50]

Proceedings of the

Milnor, John , TITLE =. Proceedings of the. 1987 , ISBN =

1987
[51]

Wall, C. T. C. , TITLE =. Proceedings of the. 1984 , ISBN =

1984
[52]

Proceedings of the

Atiyah, Michael , TITLE =. Proceedings of the. 1987 , ISBN =

1987
[53]

, TITLE =

Birman, Joan S. , TITLE =. Proceedings of the. 1991 , ISBN =

1991
[54]

Highly accurate protein structure prediction with

Jumper, John and Evans, Richard and Pritzel, Alexander and Green, Tim and Figurnov, Michael and Ronneberger, Olaf and Tunyasuvunakool, Kathryn and Bates, Russ and. Highly accurate protein structure prediction with. Nature , volume =. 2021 , doi =

2021
[55]

Nature , volume =

Abramson, Josh and Adler, Jonas and Dunger, Jack and Evans, Richard and Green, Tim and Pritzel, Alexander and Ronneberger, Olaf and Willmore, Lindsay and Ballard, Andrew J and Bambrick, Joshua and others , title =. Nature , volume =. 2024 , doi =

2024
[56]

utt, Kristof T. and Kindermans, Pieter-Jan and Sauceda, Huziel E. and Chmiela, Stefan and Tkatchenko, Alexandre and M\

Sch\"utt, Kristof T. and Kindermans, Pieter-Jan and Sauceda, Huziel E. and Chmiela, Stefan and Tkatchenko, Alexandre and M\"uller, Klaus-Robert , title =. Advances in Neural Information Processing Systems , volume =. 2017 , pages =

2017
[57]

International Conference on Machine Learning (ICML) , series =

Satorras, V\'. International Conference on Machine Learning (ICML) , series =
[58]

International Conference on Learning Representations (ICLR) , year =

Gasteiger, Johannes and Gro , Janek and G\"unnemann, Stephan , title =. International Conference on Learning Representations (ICLR) , year =
[59]

and Maron, Haggai , title =

Eitan, Yam and Gelberg, Yoav and Bar-Shalom, Guy and Frasca, Fabrizio and Bronstein, Michael M. and Maron, Haggai , title =. International Conference on Learning Representations (ICLR) , year =
[60]

International Conference on Learning Representations (ICLR) , year =

Dumitrescu, Alexandru and Korpela, Dani and Heinonen, Markus and Verma, Yogesh and Iakovlev, Valerii and Garg, Vikas and L\"ahdesmäki, Harri , title =. International Conference on Learning Representations (ICLR) , year =
[61]

Machine Learning and Knowledge Discovery in Databases (ECML PKDD) , pages =

Gai\'. Machine Learning and Knowledge Discovery in Databases (ECML PKDD) , pages =. 2023 , publisher =. doi:10.1007/978-3-031-43418-1_3 , note =

work page doi:10.1007/978-3-031-43418-1_3 2023
[62]

Knots and -Curves Identification in Polymeric Chains and Native Proteins Using Neural Networks , journal =

da Silva, Fernando Bruno and Gabrov. Knots and -Curves Identification in Polymeric Chains and Native Proteins Using Neural Networks , journal =. 2024 , doi =

2024
[63]

Briefings in Bioinformatics , volume =

Han, Bingqing and Zhang, Yipeng and Li, Longlong and Gong, Xinqi and Xia, Kelin , title =. Briefings in Bioinformatics , volume =. 2025 , doi =

2025
[64]

Journal of Chemical Information and Modeling , year =

Wee, Junjie and Jiang, Jian , title =. Journal of Chemical Information and Modeling , year =
[65]

International Joint Conference on Artificial Intelligence (IJCAI) , pages =

Zha, Jirong and Fan, Yuxuan and Yang, Xiao and Gao, Chen and Chen, Xinlei , title =. International Joint Conference on Artificial Intelligence (IJCAI) , pages =. 2025 , doi =

2025
[66]

2022 , howpublished =

LeCun, Yann , title =. 2022 , howpublished =

2022
[67]

2026 , note =

Ren, Junyu and Lim, Lek-Heng , title =. 2026 , note =

2026

[1] [1]

International Conference on Learning Representations (ICLR) , year =

Park, Sejun and Yun, Chulhee and Lee, Jaeho and Shin, Jinwoo , title =. International Conference on Learning Representations (ICLR) , year =

[2] [2]

International Conference on Learning Representations (ICLR) , year =

Cai, Yongqiang , title =. International Conference on Learning Representations (ICLR) , year =

[3] [3]

Proceedings of the 40th International Conference on Machine Learning , series =

Li, Li'Ang and Duan, Yifei and Ji, Guanghua and Cai, Yongqiang , title =. Proceedings of the 40th International Conference on Machine Learning , series =. 2023 , publisher =

2023

[4] [4]

International Conference on Learning Representations (ICLR) , year =

Kim, Namjun and Min, Chanho and Park, Sejun , title =. International Conference on Learning Representations (ICLR) , year =

[5] [5]

, title =

Palais, Richard S. , title =. Comment. Math. Helv. , volume =. 1960 , pages =

1960

[6] [6]

Hudson, J. F. P. and Zeeman, E. C. , title =. Inst. Hautes \'Etudes Sci. Publ. Math. , volume =. 1964 , pages =

1964

[7] [7]

1976 , isbn =

Rolfsen, Dale , title =. 1976 , isbn =

1976

[8] [8]

2002 , isbn =

Hatcher, Allen , title =. 2002 , isbn =

2002

[9] [9]

, title =

Munkres, James R. , title =. 2000 , isbn =

2000

[10] [10]

2014 , howpublished =

Olah, Christopher , title =. 2014 , howpublished =

2014

[11] [11]

Naitzat, Gregory and Zhitnikov, Andrey and Lim, Lek-Heng , title =. J. Mach. Learn. Res. , volume =. 2020 , number =

2020

[12] [12]

Carlsson, Gunnar , title =. Bull. Amer. Math. Soc. (N.S.) , volume =. 2009 , number =

2009

[13] [13]

, title =

Edelsbrunner, Herbert and Harer, John L. , title =. 2010 , isbn =

2010

[14] [14]

PLOS Computational Biology , volume =

Cang, Zixuan and Wei, Guo-Wei , title =. PLOS Computational Biology , volume =. 2017 , doi =

2017

[15] [15]

Advances in Neural Information Processing Systems (NeurIPS) , volume =

Esmaeili, Babak and Walters, Robin and Zimmermann, Heiko and van de Meent, Jan-Willem , title =. Advances in Neural Information Processing Systems (NeurIPS) , volume =. 2023 , note =

2023

[16] [16]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , year =

He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian , title =. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , year =

[17] [17]

and Kaiser,

Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N. and Kaiser,. Attention is all you need , booktitle =. 2017 , pages =

2017

[18] [18]

, title =

Cover, Thomas M. , title =. IEEE Transactions on Electronic Computers , volume =. 1965 , pages =

1965

[19] [19]

Machine Learning , volume =

Cortes, Corinna and Vapnik, Vladimir , title =. Machine Learning , volume =. 1995 , pages =

1995

[20] [20]

Cybenko, George , title =. Math. Control Signals Systems , volume =. 1989 , number =

1989

[21] [21]

Neural Networks , volume =

Hornik, Kurt and Stinchcombe, Maxwell and White, Halbert , title =. Neural Networks , volume =. 1989 , number =

1989

[22] [22]

, title =

Barron, Andrew R. , title =. IEEE Trans. Inform. Theory , volume =. 1993 , pages =

1993

[23] [23]

Conference on Learning Theory (COLT) , series =

Telgarsky, Matus , title =. Conference on Learning Theory (COLT) , series =. 2016 , pages =

2016

[24] [24]

Conference on Learning Theory (COLT) , series =

Eldan, Ronen and Shamir, Ohad , title =. Conference on Learning Theory (COLT) , series =. 2016 , pages =

2016

[25] [25]

Neural Networks , volume =

Yarotsky, Dmitry , title =. Neural Networks , volume =. 2017 , pages =

2017

[26] [26]

and Harvey, Nick and Liaw, Christopher and Mehrabian, Abbas , title =

Bartlett, Peter L. and Harvey, Nick and Liaw, Christopher and Mehrabian, Abbas , title =. J. Mach. Learn. Res. , volume =. 2019 , pages =

2019

[27] [27]

2017 , note =

Hanin, Boris and Sellke, Mark , title =. 2017 , note =

2017

[28] [28]

International Conference on Learning Representations (ICLR) , year =

Johnson, Jesse , title =. International Conference on Learning Representations (ICLR) , year =

[29] [29]

2024 , note =

Rochau, Dennis and Chan, Robin and Gottschalk, Hanno , title =. 2024 , note =

2024

[30] [30]

Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics (AISTATS) , series =

Glorot, Xavier and Bordes, Antoine and Bengio, Yoshua , title =. Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics (AISTATS) , series =. 2011 , pages =

2011

[31] [31]

and Hannun, Awni Y

Maas, Andrew L. and Hannun, Awni Y. and Ng, Andrew Y. , title =. Proceedings of the ICML 2013 Workshop on Deep Learning for Audio, Speech and Language Processing , year =

2013

[32] [32]

International Conference on Learning Representations (ICLR) , year =

Clevert, Djork-Arn\'e and Unterthiner, Thomas and Hochreiter, Sepp , title =. International Conference on Learning Representations (ICLR) , year =

[33] [33]

Gaussian Error Linear Units (GELUs)

Hendrycks, Dan and Gimpel, Kevin , title =. arXiv:1606.08415 , year =

work page internal anchor Pith review Pith/arXiv arXiv

[34] [34]

Searching for Activation Functions

Ramachandran, Prajit and Zoph, Barret and Le, Quoc V. , title =. arXiv:1710.05941 , year =

work page internal anchor Pith review Pith/arXiv arXiv

[35] [35]

British Machine Vision Conference (BMVC) , year =

Misra, Diganta , title =. British Machine Vision Conference (BMVC) , year =

[36] [36]

Neural Networks , volume =

Elfwing, Stefan and Uchibe, Eiji and Doya, Kenji , title =. Neural Networks , volume =. 2018 , pages =. doi:10.1016/j.neunet.2017.12.012 , note =

work page doi:10.1016/j.neunet.2017.12.012 2018

[37] [37]

International Conference on Learning Representations (ICLR) , year =

Dinh, Laurent and Sohl-Dickstein, Jascha and Bengio, Samy , title =. International Conference on Learning Representations (ICLR) , year =

[38] [38]

and Dhariwal, Prafulla , title =

Kingma, Diederik P. and Dhariwal, Prafulla , title =. Advances in Neural Information Processing Systems , volume =. 2018 , pages =

2018

[39] [39]

Grathwohl, Will and Chen, Ricky T. Q. and Bettencourt, Jesse and Sutskever, Ilya and Duvenaud, David , title =. International Conference on Learning Representations (ICLR) , year =

[40] [40]

Lipman, Yaron and Chen, Ricky T. Q. and Ben-Hamu, Heli and Nickel, Maximilian and Le, Matt , title =. International Conference on Learning Representations (ICLR) , year =

[41] [41]

and Ren, Mengye and Urtasun, Raquel and Grosse, Roger B

Gomez, Aidan N. and Ren, Mengye and Urtasun, Raquel and Grosse, Roger B. , title =. Advances in Neural Information Processing Systems , volume =. 2017 , pages =

2017

[42] [42]

International Conference on Learning Representations (ICLR) , year =

Jacobsen, J\"orn-Henrik and Smeulders, Arnold and Oyallon, Edouard , title =. International Conference on Learning Representations (ICLR) , year =

[43] [43]

Zico and He, Kaiming , title =

Geng, Zhengyang and Deng, Mingyang and Bai, Xingjian and Kolter, J. Zico and He, Kaiming , title =. 2025 , note =

2025

[44] [44]

Chen, Ricky T. Q. and Rubanova, Yulia and Bettencourt, Jesse and Duvenaud, David , title =. Advances in Neural Information Processing Systems , volume =. 2018 , pages =

2018

[45] [45]

Advances in Neural Information Processing Systems , volume =

Dupont, Emilien and Doucet, Arnaud and Teh, Yee Whye , title =. Advances in Neural Information Processing Systems , volume =. 2019 , note =

2019

[46] [46]

, title =

Qu, Ante and James, Doug L. , title =. ACM Trans. Graph. , volume =. 2021 , pages =. doi:10.1145/3450626.3459778 , note =

work page doi:10.1145/3450626.3459778 2021

[47] [47]

IEEE Trans

Bengio, Yoshua and Courville, Aaron and Vincent, Pascal , title =. IEEE Trans. Pattern Anal. Mach. Intell. , volume =. 2013 , number =

2013

[48] [48]

and Welling, Max , title =

Kingma, Diederik P. and Welling, Max , title =. International Conference on Learning Representations (ICLR) , year =

[49] [49]

International

Lott, John , TITLE =. International. 2007 , ISBN =. doi:10.4171/022-1/2 , URL =

work page doi:10.4171/022-1/2 2007

[50] [50]

Proceedings of the

Milnor, John , TITLE =. Proceedings of the. 1987 , ISBN =

1987

[51] [51]

Wall, C. T. C. , TITLE =. Proceedings of the. 1984 , ISBN =

1984

[52] [52]

Proceedings of the

Atiyah, Michael , TITLE =. Proceedings of the. 1987 , ISBN =

1987

[53] [53]

, TITLE =

Birman, Joan S. , TITLE =. Proceedings of the. 1991 , ISBN =

1991

[54] [54]

Highly accurate protein structure prediction with

Jumper, John and Evans, Richard and Pritzel, Alexander and Green, Tim and Figurnov, Michael and Ronneberger, Olaf and Tunyasuvunakool, Kathryn and Bates, Russ and. Highly accurate protein structure prediction with. Nature , volume =. 2021 , doi =

2021

[55] [55]

Nature , volume =

Abramson, Josh and Adler, Jonas and Dunger, Jack and Evans, Richard and Green, Tim and Pritzel, Alexander and Ronneberger, Olaf and Willmore, Lindsay and Ballard, Andrew J and Bambrick, Joshua and others , title =. Nature , volume =. 2024 , doi =

2024

[56] [56]

utt, Kristof T. and Kindermans, Pieter-Jan and Sauceda, Huziel E. and Chmiela, Stefan and Tkatchenko, Alexandre and M\

Sch\"utt, Kristof T. and Kindermans, Pieter-Jan and Sauceda, Huziel E. and Chmiela, Stefan and Tkatchenko, Alexandre and M\"uller, Klaus-Robert , title =. Advances in Neural Information Processing Systems , volume =. 2017 , pages =

2017

[57] [57]

International Conference on Machine Learning (ICML) , series =

Satorras, V\'. International Conference on Machine Learning (ICML) , series =

[58] [58]

International Conference on Learning Representations (ICLR) , year =

Gasteiger, Johannes and Gro , Janek and G\"unnemann, Stephan , title =. International Conference on Learning Representations (ICLR) , year =

[59] [59]

and Maron, Haggai , title =

Eitan, Yam and Gelberg, Yoav and Bar-Shalom, Guy and Frasca, Fabrizio and Bronstein, Michael M. and Maron, Haggai , title =. International Conference on Learning Representations (ICLR) , year =

[60] [60]

International Conference on Learning Representations (ICLR) , year =

Dumitrescu, Alexandru and Korpela, Dani and Heinonen, Markus and Verma, Yogesh and Iakovlev, Valerii and Garg, Vikas and L\"ahdesmäki, Harri , title =. International Conference on Learning Representations (ICLR) , year =

[61] [61]

Machine Learning and Knowledge Discovery in Databases (ECML PKDD) , pages =

Gai\'. Machine Learning and Knowledge Discovery in Databases (ECML PKDD) , pages =. 2023 , publisher =. doi:10.1007/978-3-031-43418-1_3 , note =

work page doi:10.1007/978-3-031-43418-1_3 2023

[62] [62]

Knots and -Curves Identification in Polymeric Chains and Native Proteins Using Neural Networks , journal =

da Silva, Fernando Bruno and Gabrov. Knots and -Curves Identification in Polymeric Chains and Native Proteins Using Neural Networks , journal =. 2024 , doi =

2024

[63] [63]

Briefings in Bioinformatics , volume =

Han, Bingqing and Zhang, Yipeng and Li, Longlong and Gong, Xinqi and Xia, Kelin , title =. Briefings in Bioinformatics , volume =. 2025 , doi =

2025

[64] [64]

Journal of Chemical Information and Modeling , year =

Wee, Junjie and Jiang, Jian , title =. Journal of Chemical Information and Modeling , year =

[65] [65]

International Joint Conference on Artificial Intelligence (IJCAI) , pages =

Zha, Jirong and Fan, Yuxuan and Yang, Xiao and Gao, Chen and Chen, Xinlei , title =. International Joint Conference on Artificial Intelligence (IJCAI) , pages =. 2025 , doi =

2025

[66] [66]

2022 , howpublished =

LeCun, Yann , title =. 2022 , howpublished =

2022

[67] [67]

2026 , note =

Ren, Junyu and Lim, Lek-Heng , title =. 2026 , note =

2026