arxiv: 2605.03598 · v2 · submitted 2026-05-05 · 💻 cs.NE · cs.AI

Recognition: 4 theorem links

· Lean Theorem

Unifying Dynamical Systems and Graph Theory to Mechanistically Understand Computation in Neural Networks

Dan F.M Goodman, Danyal Akarca, Jatin Sharma

Authors on Pith no claims yet

Pith reviewed 2026-05-08 18:28 UTC · model grok-4.3

classification 💻 cs.NE cs.AI

keywords recurrent neural networksmulti-hop pathwaysgraph theoryregularizationtemporal sparsitydynamical systemsneural computation

0 comments

The pith

RNN computation can be recovered by decomposing multi-hop pathways in the network's connectivity graph.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that the spatial layout and timing of information flow in recurrent neural networks trained on hierarchically modular tasks emerge directly from the multi-hop routes connecting input units to output units when the network is viewed as a graph. Decomposing those routes by the number of steps each path takes reveals the temporal order in which the network processes its input. This perspective implies that common weight penalties such as L1 regularization act only on direct links and therefore leave the actual multi-step routes that carry computation unconstrained. The authors therefore introduce resolvent-RNNs, which penalize the multi-hop pathways themselves and produce temporal sparsity that better matches the structure of the task, yielding higher performance and greater robustness even when the input signal is sparse.

Core claim

Representing a trained RNN as a graph allows the multi-hop pathways between input and output units to be isolated and ordered by hop length; this decomposition recovers both the spatial organization of the computation and the sequence of steps the network uses to route information over time. Because function is realized through these multi-hop routes rather than single weights, the paper defines a new regularizer that directly constrains the resolvent of the connectivity matrix, thereby inducing temporal sparsity aligned with the hierarchical modularity of the task.

What carries the argument

multi-hop pathways between input and output units in the RNN connectivity graph, decomposed by hop length to expose temporal routing

If this is right

L1 regularization constrains only single-hop weights and therefore cannot directly shape the multi-hop routes that implement the computation.
Resolvent-RNNs produce temporal sparsity that aligns with the hierarchical structure of the task and outperforms L1 regularization on the same tasks.
Sparsity-function alignment improves under the new regularizer, which is visible as greater robustness when regularization strength is increased.
Multi-hop communication supplies the explicit link between the network's structure and its functional behavior in recurrent networks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same graph decomposition could be applied to other recurrent architectures to test whether their timing also arises from multi-hop structure.
If the static-graph view holds, one could predict task performance from the connectivity matrix alone without running the full dynamical simulation.
Biological networks with known anatomy might be analyzed for multi-hop routes to generate predictions about their temporal processing order.

Load-bearing premise

The temporal computation performed by the trained RNN dynamics is exactly equivalent to the static multi-hop pathways present in the final connectivity graph.

What would settle it

After training an RNN on a hierarchically modular task, extract its final connectivity graph, compute the hop-length decomposition of all input-to-output paths, and check whether those paths fail to predict the actual timing of information arrival or the network's output behavior during task execution.

Figures

Figures reproduced from arXiv: 2605.03598 by Dan F.M Goodman, Danyal Akarca, Jatin Sharma.

**Figure 1.** Figure 1: Networks represented as graphs and their multi-hop structures. a, Arbitrary graphs with one pathway highlighted, illustrated by their corresponding Ak matrix. b, Two networks with distinct structures share the same two-hop pattern. c, Two MLPs with the same weight distributions have different two-hop patterns. d, We compute the total input–output influence across multiple hops in an RNN using the resolvent. 5 view at source ↗

**Figure 2.** Figure 2: Task schematics and their optimal solutions. a, Module averaging yields strong withinmodule weights. b, Subtraction yields negative weights between adjacent modules. c, Addition yields hierarchical positive inter-module weights. d, Multiplication yields an input-output map as defined by the task Jacobian. We clearly see that Rio has a higher correlation with the optimal solution for all tasks than Whh ( view at source ↗

**Figure 3.** Figure 3: The resolvent reconstructs the optimal solution for all tasks but the weights do not. a, We compare the optimal solution for the module averaging task with the mean Rio and Whh, and their associated SEM. Correlation with optimal solution (Pearson): Rio (0.9892 ± 0.0016) and Whh (0.2383 ± 0.0193). b, Analogous comparisons for the subtraction task. Correlation with optimal solution: Rio (0.9903 ± 0.0003) and… view at source ↗

**Figure 4.** Figure 4: Wk io reveals how the network temporally routes information. The network receives signal (S) and standard normally distributed noise, or no signal (NS), at alternating time steps. As k increases, Wk io corresponds to inputs arriving progressively later in time before the final RNN output. networks outputs are insensitive to those inputs. In contrast, hop based measures reveal the routing pathways through w… view at source ↗

**Figure 5.** Figure 5: R-RNNs outperform L1-RNNs by enforcing sparsity through time. a, Module averaging task: we compare the test performance and Lsparsity terms of our trained R and L1-RNNs over β. Additionally, we compare how the total multi-hop magnitude across k varies for the best R-RNN (β = 0.23 × 10−2 ) and the best L1-RNN (β = 0.01 × 10−2 ). b, Oscillating on-off signal task: Analogous comparisons. The best R and L1-RNN… view at source ↗

read the original abstract

Understanding how biological and artificial neural networks implement computation from connectivity is a central problem in neuroscience and machine learning. In neural systems, structural and functional connectivity are known to diverge, motivating approaches that move beyond direct connections alone. Here, we show that the spatial and temporal function of recurrent neural networks (RNNs) trained on hierarchically modular tasks can be recovered by modelling the network as a graph and analysing the multi-hop pathways between input and output units. In particular, decomposing these pathways by hop length reveals how the network temporally routes information. This perspective reframes regularisation: if function is implemented through multi-hop communication, then standard penalties such as L1 regularisation, which act only on individual weights, constrain single-hop structure rather than the multi-hop pathways that support computation. Motivated by this view, we introduce resolvent-RNNs (R-RNNs), which constrain multi-hop pathways and thereby induce temporal sparsity beyond that achieved by standard L1 regularisation. Compared with L1 regularisation, R-RNNs achieve improved performance by inducing temporal sparsity that matches the task structure, even when the task signal is sparse. Moreover, R-RNNs exhibit stronger sparsity-function alignment, reflected in their increased robustness under strong regularisation. Together, our results identify multi-hop communication as a key principle linking structure to function in recurrent networks, and suggest that sparsity should be defined over functional pathways rather than individual parameters.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper connects RNN temporal routing to multi-hop paths in the final graph and proposes resolvent regularization to target those paths, but the static-to-dynamic equivalence is the part that needs checking.

read the letter

The main takeaway is that the authors model a trained RNN as a graph, decompose the paths between input and output units by hop length, and use that decomposition to explain how the network routes information over time on hierarchically modular tasks. They then turn the same idea into a regularizer that penalizes multi-hop structure rather than single weights, and they report better performance and robustness than plain L1 regularization.

Referee Report

3 major / 1 minor

Summary. The paper claims that RNNs trained on hierarchically modular tasks implement computation via multi-hop pathways that can be recovered by representing the trained network as a static graph and decomposing paths between input and output units by hop length; this decomposition is said to reveal temporal information routing. Standard L1 regularization is argued to act only on single-hop edges, motivating the introduction of resolvent-RNNs that penalize multi-hop structure to induce task-aligned temporal sparsity, yielding improved performance and robustness over L1 baselines.

Significance. If the claimed equivalence between static multi-hop graph paths and nonlinear RNN temporal dynamics were established, the work would offer a useful bridge between graph theory and dynamical systems for mechanistic interpretability of recurrent networks, reframing sparsity as a property of functional pathways rather than individual weights. The perspective on regularization is conceptually coherent and could inspire new methods, but the manuscript provides no derivation linking the resolvent to the nonlinear flow nor ablations confirming that hop decomposition predicts actual routing, so the significance remains prospective rather than demonstrated.

major comments (3)

[Abstract and introduction of graph modeling] The central claim that temporal computation is recovered by decomposing multi-hop paths in the final connectivity graph assumes that iterated linear walks on the static adjacency matrix capture the state-dependent nonlinear dynamics of the RNN update h_{t+1} = σ(W h_t + U x_t). No derivation showing that the resolvent approximates this nonlinear flow, nor any ablation (e.g., targeted lesions or perturbation experiments testing whether hop-length predictions match observed information routing), is supplied to support this load-bearing modeling assumption.
[Motivation for R-RNNs] The motivation for resolvent regularization is derived from graph-theoretic analysis performed on the same trained networks that are later regularized, creating interdependence between the modeling assumption and the proposed method. Separate experiments on held-out task structures or different RNN architectures are required to demonstrate that the approach generalizes beyond post-hoc fitting to the observed connectivity.
[Results claims] Performance gains and increased robustness under strong regularization are asserted relative to L1, yet the abstract and available text supply no quantitative tables, statistical controls, or baseline comparisons that would allow verification that the improvements arise specifically from multi-hop pathway constraints rather than other implementation details.

minor comments (1)

The notation and definition of the resolvent operator should be introduced with an explicit equation at the first mention to improve readability for readers unfamiliar with graph spectral methods.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive comments, which have identified key areas for strengthening the theoretical and empirical foundations of our work. We have revised the manuscript to incorporate additional derivations, generalization experiments, and quantitative details as outlined below.

read point-by-point responses

Referee: The central claim that temporal computation is recovered by decomposing multi-hop paths in the final connectivity graph assumes that iterated linear walks on the static adjacency matrix capture the state-dependent nonlinear dynamics of the RNN update h_{t+1} = σ(W h_t + U x_t). No derivation showing that the resolvent approximates this nonlinear flow, nor any ablation (e.g., targeted lesions or perturbation experiments testing whether hop-length predictions match observed information routing), is supplied to support this load-bearing modeling assumption.

Authors: We agree that a rigorous link between the linear multi-hop decomposition and the nonlinear RNN dynamics is essential. In the revised manuscript, we have added a derivation in the Methods section showing that, for ReLU activations and under small perturbations around the operating point, the resolvent expansion approximates the temporal signal propagation via first-order Taylor expansion of the update rule. We have also included new ablation experiments with targeted perturbations to units at specific hop distances, measuring impacts on output timing; these align with the predicted routing and are now shown in Section 3.2 and Figure 4. revision: yes
Referee: The motivation for resolvent regularization is derived from graph-theoretic analysis performed on the same trained networks that are later regularized, creating interdependence between the modeling assumption and the proposed method. Separate experiments on held-out task structures or different RNN architectures are required to demonstrate that the approach generalizes beyond post-hoc fitting to the observed connectivity.

Authors: We acknowledge the risk of circularity in the original experimental pipeline. The revised manuscript now reports results on held-out task structures with novel hierarchical depths and on alternative architectures including GRUs and LSTMs. In these cases, resolvent regularization continues to yield superior temporal sparsity and task performance relative to L1, supporting broader applicability. These experiments appear in the new Section 5.3 and Table 3. revision: yes
Referee: Performance gains and increased robustness under strong regularization are asserted relative to L1, yet the abstract and available text supply no quantitative tables, statistical controls, or baseline comparisons that would allow verification that the improvements arise specifically from multi-hop pathway constraints rather than other implementation details.

Authors: We agree that the initial submission omitted sufficient quantitative detail. The revised Results section now includes comprehensive tables reporting mean performance, standard deviations over 10 seeds, and statistical significance tests (paired t-tests) against L1 and additional baselines. Controls isolating the multi-hop penalty term confirm that observed gains arise from pathway-level constraints rather than ancillary factors; these appear in Table 2 and the supplementary material. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation remains self-contained

full rationale

The paper performs an observational graph-theoretic decomposition of multi-hop pathways in already-trained RNNs on modular tasks, then proposes resolvent regularization as a new method motivated by that observation. No equation, parameter fit, or self-citation reduces the central claim or the R-RNN definition to the input data by construction. The analysis step and the regularization proposal are logically sequential rather than tautological, and the paper reports separate experiments comparing R-RNNs against L1 baselines. This satisfies the default expectation of an independent derivation chain.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim rests on the modeling choice that RNN temporal dynamics are captured by static multi-hop graph paths and on the empirical effectiveness of the new regularization; both are tested only within the paper's own experiments.

free parameters (1)

resolvent regularization strength
The weight of the multi-hop pathway penalty is a tunable hyperparameter whose value is chosen to match task structure.

axioms (1)

domain assumption RNN computation on hierarchically modular tasks is fully recoverable from multi-hop pathways in the final connectivity graph
This equivalence is invoked to justify both the graph analysis and the design of resolvent-RNNs.

invented entities (1)

resolvent-RNN no independent evidence
purpose: RNN variant whose regularization acts on multi-hop pathways rather than single weights
New model class introduced to implement the proposed regularization principle.

pith-pipeline@v0.9.0 · 5560 in / 1403 out tokens · 66713 ms · 2026-05-08T18:28:20.662783+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Cost.FunctionalEquation (J-cost forcing) — no parallel; this is a generic Neumann series, not a recognition cost. washburn_uniqueness_aczel unclear
R = (I − γA)^{-1} = Σ_{k=0}^∞ (γA)^k, with γ < 1/λ_max to ensure convergence.
Foundation.AlphaCoordinateFixation — RS would require α to be forced (e.g. by higher-derivative calibration), but here α is a tuned hyperparameter. alpha_pin_under_high_calibration unclear
We adapt this by introducing a parameter α such that γ = α/λ_max, where 0 < α < 1 ... For all plots we use α = 0.8.

Reference graph

Works this paper leans on

37 extracted references · 32 canonical work pages

[1]

Physical symbol systems.Cognitive Science, 4(2):135–183, April 1980

Allen Newell. Physical symbol systems.Cognitive Science, 4(2):135–183, April 1980. ISSN 0364-0213. doi: 10.1016/S0364-0213(80)80015-2. URL https://www.sciencedirect.com/science/article/ pii/S0364021380800152

work page doi:10.1016/s0364-0213(80)80015-2 1980
[2]

MIT Press, Cambridge, Mass, 2010

David Marr.Vision: a computational investigation into the human representation and processing of visual information. MIT Press, Cambridge, Mass, 2010. ISBN 978-0-262-51462-0 978-0-262-28961-0

2010
[3]

Unsupervised feature learning and deep learning: A review and new perspectives.CoRR, abs/1206.5538, 1(2665):2012, 2012

Yoshua Bengio, Aaron Courville, and Pascal Vincent. Representation Learning: A Review and New Perspectives, April 2014. URLhttp://arxiv.org/abs/1206.5538. arXiv:1206.5538 [cs]

work page arXiv 2014
[4]

A mathematical theory of semantic development in deep neural networks

Andrew M. Saxe, James L. McClelland, and Surya Ganguli. A mathematical theory of semantic develop- ment in deep neural networks.Proceedings of the National Academy of Sciences, 116(23):11537–11546, June 2019. doi: 10.1073/pnas.1820226116. URL https://www.pnas.org/doi/full/10.1073/pnas. 1820226116

work page doi:10.1073/pnas.1820226116 2019
[5]

C. J. Honey, O. Sporns, L. Cammoun, X. Gigandet, J. P. Thiran, R. Meuli, and P. Hagmann. Predicting human resting-state functional connectivity from structural connectivity.Proceedings of the National Academy of Sciences, 106(6):2035–2040, February 2009. doi: 10.1073/pnas.0811168106. URL https: //www.pnas.org/doi/10.1073/pnas.0811168106

work page doi:10.1073/pnas.0811168106 2035
[6]

Suárez, Ross D

Bertha Vázquez-Rodríguez, Laura E. Suárez, Ross D. Markello, Golia Shafiei, Casey Paquola, Patric Hagmann, Martijn P. van den Heuvel, Boris C. Bernhardt, R. Nathan Spreng, and Bratislav Misic. Gradients of structure–function tethering across neocortex.Proceedings of the National Academy of Sciences, 116 (42):21219–21227, October 2019. doi: 10.1073/pnas.19...

work page doi:10.1073/pnas.1903403116 2019
[7]

Structural and Functional Brain Networks: From Connections to Cognition.Science, 342(6158):1238411, November 2013

Hae-Jeong Park and Karl Friston. Structural and Functional Brain Networks: From Connections to Cognition.Science, 342(6158):1238411, November 2013. doi: 10.1126/science.1238411. URL https: //www.science.org/doi/full/10.1126/science.1238411

work page doi:10.1126/science.1238411 2013
[8]

An integrative dynamical perspective for graph theory and the study of complex networks, February 2024

Gorka Zamora-López and Matthieu Gilson. An integrative dynamical perspective for graph theory and the study of complex networks, February 2024. URL http://arxiv.org/abs/2307.02449. arXiv:2307.02449

work page arXiv 2024
[9]

Kumar and T

Ernesto Estrada and Michele Benzi. Walk-based measure of balance in signed networks: Detecting lack of balance in social networks.Physical Review E, 90(4):042802, October 2014. doi: 10.1103/PhysRevE.90. 042802. URLhttps://link.aps.org/doi/10.1103/PhysRevE.90.042802

work page doi:10.1103/physreve.90 2014
[10]

Hilgetag

Kayson Fakhar and Claus C. Hilgetag. Systematic perturbation of an artificial neural network: A step towards quantifying causal contributions in the brain.PLOS Computational Biology, 18(6):e1010250, June 2022. ISSN 1553-7358. doi: 10.1371/journal.pcbi.1010250. URL https://journals.plos.org/ ploscompbiol/article?id=10.1371/journal.pcbi.1010250

work page doi:10.1371/journal.pcbi.1010250 2022
[11]

Crofts and Desmond J

Jonathan J. Crofts and Desmond J. Higham. A weighted communicability measure applied to complex brain networks.Journal of the Royal Society, Interface, 6(33):411–414, April 2009. ISSN 1742-5689. doi: 10.1098/rsif.2008.0484. 11

work page doi:10.1098/rsif.2008.0484 2009
[12]

van den Heuvel, Andrea Avena-Koenigsberger, Nieves Velez de Mendizabal, Richard F

Joaquín Goñi, Martijn P. van den Heuvel, Andrea Avena-Koenigsberger, Nieves Velez de Mendizabal, Richard F. Betzel, Alessandra Griffa, Patric Hagmann, Bernat Corominas-Murtra, Jean-Philippe Thiran, and Olaf Sporns. Resting-brain functional connectivity predicted by analytic measures of network com- munication.Proceedings of the National Academy of Science...

work page doi:10.1073/pnas.1315529111 2014
[13]

Brain network communication: concepts, models and applications.Nature Reviews Neuroscience, 24(9):557–574, September 2023

Caio Seguin, Olaf Sporns, and Andrew Zalesky. Brain network communication: concepts, models and applications.Nature Reviews Neuroscience, 24(9):557–574, September 2023. ISSN 1471-0048. doi: 10.1038/s41583-023-00718-5. URLhttps://www.nature.com/articles/s41583-023-00718-5

work page doi:10.1038/s41583-023-00718-5 2023
[14]

Jascha Achterberg, Danyal Akarca, D. J. Strouse, John Duncan, and Duncan E. Astle. Spatially embedded recurrent neural networks reveal widespread links between structural and functional neuroscience findings. Nature Machine Intelligence, 5(12):1369–1381, December 2023. ISSN 2522-5839. doi: 10.1038/ s42256-023-00748-9. URLhttps://www.nature.com/articles/s4...

2023
[15]

van den Heuvel, and Andrew Zalesky

Caio Seguin, Martijn P. van den Heuvel, and Andrew Zalesky. Navigation of brain networks.Proceedings of the National Academy of Sciences, 115(24):6297–6302, June 2018. doi: 10.1073/pnas.1801351115. URL https://www.pnas.org/doi/abs/10.1073/pnas.1801351115. Company: National Academy of Sciences Distributor: National Academy of Sciences Institution: National...

work page doi:10.1073/pnas.1801351115 2018
[16]

The Physics of Communicability in Complex Networks.Physics Reports, 514(3):89–119, May 2012

Ernesto Estrada, Naomichi Hatano, and Michele Benzi. The Physics of Communicability in Complex Networks.Physics Reports, 514(3):89–119, May 2012. ISSN 0370-1573. doi: 10.1016/j.physrep.2012.01

work page doi:10.1016/j.physrep.2012.01 2012
[17]

arXiv:1109.2950 [physics]

URLhttp://arxiv.org/abs/1109.2950. arXiv:1109.2950 [physics]

work page arXiv
[18]

June 2024

Kayson Fakhar, Fatemeh Hadaeghi, Caio Seguin, Shrey Dixit, Arnaud Messé, Gorka Zamora-López, Bratislav Misic, and Claus Hilgetag.A General Framework for Characterizing Optimal Communication in Brain Networks. June 2024. doi: 10.1101/2024.06.12.598676

work page doi:10.1101/2024.06.12.598676 2024
[19]

NeuralSens: Sensitivity Analysis of Neural Networks

Jaime Pizarroso, José Portela, and Antonio Muñoz. NeuralSens: Sensitivity Analysis of Neural Networks. Journal of Statistical Software, 102:1–36, April 2022. ISSN 1548-7660. doi: 10.18637/jss.v102.i07. URL https://doi.org/10.18637/jss.v102.i07

work page doi:10.18637/jss.v102.i07 2022
[20]

Yeung, Ian Cloete, Daming Shi, and Wing W

Daniel S. Yeung, Ian Cloete, Daming Shi, and Wing W. Y . Ng.Sensitivity Analysis for Neural Net- works. Natural Computing Series. Springer, Berlin, Heidelberg, 2010. ISBN 978-3-642-02531-0 978- 3-642-02532-7. doi: 10.1007/978-3-642-02532-7. URL http://link.springer.com/10.1007/ 978-3-642-02532-7

work page doi:10.1007/978-3-642-02532-7 2010
[21]

Inferring causality in brain images: a perturbation approach.Philosophical Transactions of the Royal Society B: Biological Sciences, 360(1457):1109–1114, May 2005

Tomáš Paus. Inferring causality in brain images: a perturbation approach.Philosophical Transactions of the Royal Society B: Biological Sciences, 360(1457):1109–1114, May 2005. ISSN 0962-8436. doi: 10.1098/rstb.2005.1652. URLhttps://doi.org/10.1098/rstb.2005.1652

work page doi:10.1098/rstb.2005.1652 2005
[22]

Barack and John W

David L. Barack and John W. Krakauer. Two views on the cognitive brain.Nature Reviews Neuroscience, 22(6):359–371, June 2021. ISSN 1471-0048. doi: 10.1038/s41583-021-00448-6. URL https://www. nature.com/articles/s41583-021-00448-6

work page doi:10.1038/s41583-021-00448-6 2021
[23]

Frontiers in Neuroscience4, 200 (2010) https://doi.org/10.3389/fnins.2010.00200

David Meunier, Renaud Lambiotte, and Edward T. Bullmore. Modular and Hierarchically Modular Organization of Brain Networks.Frontiers in Neuroscience, 4:200, December 2010. ISSN 1662-4548. doi: 10.3389/fnins.2010.00200. URLhttps://pmc.ncbi.nlm.nih.gov/articles/PMC3000003/

work page doi:10.3389/fnins.2010.00200 2010
[24]

Olaf Sporns and Christopher J. Honey. Small worlds inside big brains.Proceedings of the National Academy of Sciences, 103(51):19219–19220, December 2006. doi: 10.1073/pnas.0609523103. URL https://www.pnas.org/doi/10.1073/pnas.0609523103

work page doi:10.1073/pnas.0609523103 2006
[25]

Mattar, Huajin Tang, and Gang Pan

Shi Gu, Marcelo G. Mattar, Huajin Tang, and Gang Pan. Emergence and reconfiguration of modular structure for artificial neural networks during continual familiarity detection.Science Advances, 10(30): eadm8430. ISSN 2375-2548. doi: 10.1126/sciadv.adm8430. URL https://www.ncbi.nlm.nih.gov/ pmc/articles/PMC11277393/

work page doi:10.1126/sciadv.adm8430
[26]

Efficient Behavior of Small-World Networks.Physical Review Letters, 87(19):198701, October 2001

Vito Latora and Massimo Marchiori. Efficient Behavior of Small-World Networks.Physical Review Letters, 87(19):198701, October 2001. ISSN 0031-9007, 1079-7114. doi: 10.1103/PhysRevLett.87.198701. URLhttps://link.aps.org/doi/10.1103/PhysRevLett.87.198701

work page doi:10.1103/physrevlett.87.198701 2001
[27]

Gabriel Béna and Dan F. M. Goodman. Dynamics of specialization in neural modules under resource constraints, October 2024. URLhttp://arxiv.org/abs/2106.02626. arXiv:2106.02626

work page arXiv 2024
[28]

Mechanistic Interpretability for AI Safety – A Review, August

Leonard Bereska and Efstratios Gavves. Mechanistic Interpretability for AI Safety – A Review, August
[29]

arXiv preprint arXiv:2404.14082 (2024)

URLhttp://arxiv.org/abs/2404.14082. arXiv:2404.14082 [cs]. 12

work page arXiv
[30]

Transformers are Graph Neural Networks, February 2020

Chaitanya Joshi. Transformers are Graph Neural Networks, February 2020. URL https:// graphdeeplearning.github.io/post/transformers-are-gnns/

2020
[31]

Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation, July 2025

Sangmin Bae, Yujin Kim, Reza Bayat, Sungnyun Kim, Jiyoun Ha, Tal Schuster, Adam Fisch, Hrayr Harutyunyan, Ziwei Ji, Aaron Courville, and Se-Young Yun. Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation, July 2025. URL http://arxiv.org/abs/ 2507.10524. arXiv:2507.10524 [cs]

work page arXiv 2025
[32]

Smith, Pang Wei Koh, Amanpreet Singh, and Hannaneh Hajishirzi

Niklas Muennighoff, Luca Soldaini, Dirk Groeneveld, Kyle Lo, Jacob Morrison, Sewon Min, Weijia Shi, Pete Walsh, Oyvind Tafjord, Nathan Lambert, Yuling Gu, Shane Arora, Akshita Bhagia, Dustin Schwenk, David Wadden, Alexander Wettig, Binyuan Hui, Tim Dettmers, Douwe Kiela, Ali Farhadi, Noah A. Smith, Pang Wei Koh, Amanpreet Singh, and Hannaneh Hajishirzi. O...

work page arXiv 2025
[33]

Chu, and Michael J

Wei Zhang, Viktoria Muravina, Robert Azencott, Zili D. Chu, and Michael J. Paldino. Mutual Infor- mation Better Quantifies Brain Network Architecture in Children with Epilepsy.Computational and Mathematical Methods in Medicine, 2018(1):6142898, 2018. ISSN 1748-6718. doi: 10.1155/2018/ 6142898. URL https://onlinelibrary.wiley.com/doi/abs/10.1155/2018/61428...

work page doi:10.1155/2018/ 2018
[34]

Brain functional connectivity analysis using mutual information

Zhe Wang, Ahmed Alahmadi, David Zhu, and Tongtong Li. Brain functional connectivity analysis using mutual information. In2015 IEEE Global Conference on Signal and Information Processing (GlobalSIP), pages 542–546, December 2015. doi: 10.1109/GlobalSIP.2015.7418254. URL https: //ieeexplore.ieee.org/document/7418254/

work page doi:10.1109/globalsip.2015.7418254 2015
[35]

FastGR: Global Routing on CPU–GPU With Heterogeneous Task Graph Scheduler.IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 42(7):2317–2330, July 2023

Siting Liu, Yuan Pu, Peiyu Liao, Hongzhong Wu, Rui Zhang, Zhitang Chen, Wenlong Lv, Yibo Lin, and Bei Yu. FastGR: Global Routing on CPU–GPU With Heterogeneous Task Graph Scheduler.IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 42(7):2317–2330, July 2023. ISSN 1937-4151. doi: 10.1109/TCAD.2022.3217668. URL https://ieeexplore...

work page doi:10.1109/tcad.2022.3217668 2023
[36]

Thekkath, and Yonghui Wu

Paul Barham, Aakanksha Chowdhery, Jeff Dean, Sanjay Ghemawat, Steven Hand, Dan Hurt, Michael Isard, Hyeontaek Lim, Ruoming Pang, Sudip Roy, Brennan Saeta, Parker Schuh, Ryan Sepassi, Laurent El Shafey, Chandramohan A. Thekkath, and Yonghui Wu. Pathways: Asynchronous Distributed Dataflow for ML, March 2022. URLhttp://arxiv.org/abs/2203.12533. arXiv:2203.12...

work page arXiv 2022
[37]

Instead, we calculate the input-output sensitivity by computing the task’s Jacobian,J, where each element is given by: Jij = ∂Oj ∂Ii .(17) SinceO j =Qj k=1 µk, we obtain: Jij =    Oj µi , i≤j, 0, i > j. (18) 14 In practice, the module level means are sampled from normal distributions as shown in equation 7, and given that the network is trained on N in...