pith. machine review for the scientific record. sign in

arxiv: 2605.08681 · v1 · submitted 2026-05-09 · 📊 stat.ML · cs.AI· cs.LG· cs.NA· math.NA

Recognition: no theorem link

Core-Halo Decomposition: Decentralizing Large-Scale Fixed-Point Problems

Authors on Pith no claims yet

Pith reviewed 2026-05-12 00:51 UTC · model grok-4.3

classification 📊 stat.ML cs.AIcs.LGcs.NAmath.NA
keywords core-halo decompositionfixed-point equationsdecentralized multi-agent systemsstructural biasblock dependenceparallel optimizationmean operator
0
0 comments X

The pith

Core-halo decomposition lets agents solve large fixed-point equations without permanent structural bias from cross-block dependencies.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that standard strict decomposition, where each agent updates only its own block using local variables, changes the mean operator and creates bias that extra samples or smaller steps cannot remove. Core-halo decomposition instead gives each agent write ownership over a disjoint core while allowing read access to an overlapping halo that captures necessary dependencies. When this split is aligned with the block-dependence structure of the operator, the decentralized system implements the original fixed-point equation exactly. The authors prove this by deriving a Bellman closure condition and a blockwise bias lower bound that strict methods violate but core-halo satisfies. Experiments across applications confirm the decentralized version reaches the same solution quality as a centralized solver while keeping parallelism.

Core claim

By separating write ownership from read-only evaluation context and aligning the Core-Halo split with the block-dependence structure of the mean operator F-bar, the original fixed-point problem x-star equals F-bar of x-star can be solved faithfully in a decentralized multi-agent system; strict decomposition alters the operator and cannot recover the true fixed point, as shown by the Bellman closure condition and blockwise bias lower bound.

What carries the argument

Core-Halo decomposition, which assigns each agent a disjoint core for writing updates and an overlapping halo for reading external dependencies to preserve the original operator.

If this is right

  • Strict decomposition necessarily changes the fixed-point operator for any operator with cross-block dependence, and this change is irreducible by standard convergence adjustments.
  • Core-halo preserves the exact fixed point of the original operator once the split is aligned with dependence blocks.
  • The approach retains full parallelism benefits of decentralization while achieving solution accuracy indistinguishable from centralized computation in tested settings.
  • Bellman closure and blockwise bias bounds give a precise way to diagnose when strict decomposition will fail.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same core-halo idea could map to distributed value iteration or policy evaluation in Markov decision processes if state-action dependencies can be partitioned similarly.
  • In large-scale equilibrium computation such as traffic or market models, the method might allow agents to maintain local consistency without a central clearinghouse.
  • A practical test would be to measure communication volume versus solution error on a real-world operator whose dependence graph is sparse but not block-diagonal.

Load-bearing premise

The block-dependence structure of the mean operator is identifiable and can be matched to a core-halo split without adding new bias or requiring full central coordination.

What would settle it

Compute the fixed point under strict block decomposition versus core-halo on a synthetic operator whose blocks have known cross-dependencies; if the strict solution differs from the centralized one by more than sampling error while core-halo matches exactly, the claim holds.

Figures

Figures reproduced from arXiv: 2605.08681 by Haixiang, Jiayu Chen, Jiefu Zhang, Jun He, Xudong Wu, Yang Xu, Zihan Zhou.

Figure 1
Figure 1. Figure 1: Strict Local Decom￾position. We first recall the notion of a partition, which will be used through￾out the paper. Definition 1. A collection of sets D1, · · · , Dm ⊆ Ω is said to form a partition of Ω, if all Di’s are pairwise disjoint and ∪ m i=1Di = Ω. Strict local decomposition formalizes the hard-partition rule at the level of the induced mean operator. Definition 2 (Strict Decomposition). Let D1, . . … view at source ↗
Figure 2
Figure 2. Figure 2: Core updates, halo for local context. We now state the structural payoff. Core-Halo does not re￾quire block closure under F¯ and it only requires that every variable needed to evaluate the update on Di be present in the halo Si . This condition is exactly aligned with the op￾erator’s intrinsic block-dependence structure. Theorem 1 (Exact recovery by Core-Halo). Let {Di} m i=1 be a partition of [d]. Suppose… view at source ↗
Figure 3
Figure 3. Figure 3: Return on the 24 × 24 MiniGrid. We first use a controlled MiniGrid navigation task [11] to illustrate the boundary effect pre￾dicted by our theory. The goal is to test (RQ1) whether strict local decomposition becomes less reliable as the number of partitions increases, and (RQ2) whether the proposed Core-Halo construc￾tion remains stable by preserving one-step cross￾boundary evaluation context. We construc… view at source ↗
Figure 4
Figure 4. Figure 4: Core-Halo preserves the original PageRank fixed point and converges rapidly, while strict [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Learning curves detailing the total grid cost trajectories across the IEEE 9, 14, and 30 bus. [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Traffic-Control Training Dynamics. and under a locality condition exactly preserves the original fixed points. The bias bounds, Bellman gridworld analysis, and experiments on MiniGrid, PageRank, smart grids, and traffic control support this perspective. The potential limitations might include, more downstream applications and corre￾sponding specific Core-Halo decompositions might need exploration, and in t… view at source ↗
Figure 7
Figure 7. Figure 7: ∥V k − V π∥∞ versus parallel itera￾tions. 22 [PITH_FULL_IMAGE:figures/full_fig_p022_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Two SUMO topologies used to evaluate the strict decomposition , Core-Halo, and cen [PITH_FULL_IMAGE:figures/full_fig_p028_8.png] view at source ↗
read the original abstract

We study solving large-scale fixed-point equation \(x^\star=\bar F(x^\star)\) with decomposition. Standard strict decomposition assigns each agent a disjoint block and evaluates updates using only owned coordinates. For most operators, however, a block update may depend on variables outside the block. Truncating these dependencies by strict decomposition changes the mean operator and creates structural bias that cannot be removed by more samples, smaller stepsizes, or additional consensus. We therefore propose Core-Halo decomposition, which separates write ownership from read-only evaluation context: each agent updates its own core and reads from an overlapping halo. By aligning the Core-Halo decomposition with the block-dependence structure of $\bar F$, the original fixed-point problem can be implemented faithfully in a decentralized multi-agent system. We further characterize the fundamental obstruction faced by strict decomposition through a Bellman closure condition and a blockwise bias lower bound, showing that local-only updates can alter the original fixed-point operator. Finally, we conduct extensive experiments across a range of application settings, and demonstrate that Core-Halo achieves near-centralized performance while retaining the parallelism benefits of decentralization.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper studies decentralized solution of large-scale fixed-point equations x* = F-bar(x*) via Core-Halo decomposition, in which agents own and update disjoint core blocks while reading from overlapping halo regions. It shows that strict block decomposition generally alters the mean operator through a Bellman closure condition and induces an irreducible blockwise bias lower bound, whereas Core-Halo decomposition recovers the original fixed point when the split is aligned with the dependence blocks of F-bar. Experiments across several application domains report performance close to centralized baselines while preserving parallelism.

Significance. If the alignment step can be realized without central coordination or additional bias, the approach would enable faithful decentralized fixed-point computation at scale, with direct relevance to distributed optimization, multi-agent reinforcement learning, and large-scale equilibrium finding. The explicit bias characterization for strict decomposition is a useful theoretical contribution; the empirical results, if they hold under realistic alignment conditions, would strengthen the case for Core-Halo over existing decentralized schemes.

major comments (2)
  1. [§3] §3 (Bellman closure and bias lower bound): the central claim that Core-Halo recovers the original operator rests on exact alignment between the Core-Halo partition and the (unknown) block-dependence structure of F-bar. No procedure is given for agents to discover or agree on these blocks from local halo reads alone; any global identification step would either reintroduce central coordination or risk incomplete halos that re-create the bias the method seeks to avoid.
  2. [§5] §5 (experiments): the reported near-centralized performance is obtained under best-case alignment with the true dependence blocks. It is unclear whether the same performance is achieved when the partition must be inferred locally or when the alignment is imperfect; without such controls, the experiments do not yet demonstrate robustness of the decentralized case.
minor comments (2)
  1. Notation for the mean operator F-bar and the blockwise bias bound should be introduced with an explicit equation reference in the main text rather than only in the appendix.
  2. Figure captions for the experimental plots should state the precise alignment assumption used in each run (oracle vs. locally estimated blocks).

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for their insightful comments on our work. Below we provide point-by-point responses to the major comments, clarifying the scope of our contributions and indicating planned revisions to the manuscript.

read point-by-point responses
  1. Referee: [§3] §3 (Bellman closure and bias lower bound): the central claim that Core-Halo recovers the original operator rests on exact alignment between the Core-Halo partition and the (unknown) block-dependence structure of F-bar. No procedure is given for agents to discover or agree on these blocks from local halo reads alone; any global identification step would either reintroduce central coordination or risk incomplete halos that re-create the bias the method seeks to avoid.

    Authors: We thank the referee for this observation. Indeed, the theoretical guarantee that Core-Halo recovers the original fixed point relies on the Core-Halo partition being aligned with the dependence blocks of the operator. The manuscript does not include a procedure for discovering these blocks in a decentralized way from local reads, because our contribution centers on the decomposition technique and the bias analysis rather than the alignment discovery problem. In applications where the operator structure is known a priori (such as in the experimental domains considered), alignment can be set by design. We will update the manuscript to more prominently highlight this requirement and its implications for applicability. revision: partial

  2. Referee: [§5] §5 (experiments): the reported near-centralized performance is obtained under best-case alignment with the true dependence blocks. It is unclear whether the same performance is achieved when the partition must be inferred locally or when the alignment is imperfect; without such controls, the experiments do not yet demonstrate robustness of the decentralized case.

    Authors: The experiments in Section 5 assume perfect alignment to isolate the benefits of Core-Halo over strict decomposition and to show proximity to centralized performance. We agree that additional controls for imperfect alignment would strengthen the empirical claims. In the revised manuscript, we will include a discussion of potential performance degradation under misalignment, drawing from the bias lower bound in §3, and if feasible, add a small experiment illustrating sensitivity. revision: partial

standing simulated objections not resolved
  • Decentralized discovery of the operator's block-dependence structure without central coordination or risk of incomplete halos.

Circularity Check

0 steps flagged

No significant circularity; derivation self-contained via operator properties

full rationale

The paper derives the Bellman closure condition and blockwise bias lower bound directly from properties of the mean operator F-bar under strict decomposition, without reducing to fitted parameters or self-referential definitions. Core-Halo is introduced as a structural alternative that preserves the original fixed-point equation precisely when the split aligns with the (externally given) block-dependence structure of F-bar; this alignment is an explicit modeling assumption rather than a derived or fitted quantity. No equations in the provided text equate the final result to its inputs by construction, no self-citations are invoked as load-bearing uniqueness theorems, and no ansatz is smuggled via prior work. The claim therefore stands as an independent characterization plus a proposed decomposition, not a renaming or tautology.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the ability to align decomposition with operator structure; no free parameters or invented entities are introduced in the abstract.

axioms (1)
  • domain assumption The mean operator F-bar possesses an identifiable block-dependence structure that can be aligned with Core-Halo partitions.
    The method requires this alignment to faithfully reproduce the original fixed-point without bias.

pith-pipeline@v0.9.0 · 5521 in / 1232 out tokens · 55270 ms · 2026-05-12T00:51:18.611017+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

60 extracted references · 60 canonical work pages

  1. [1]

    Zico Kolter, and Vladlen Koltun

    Shaojie Bai, J. Zico Kolter, and Vladlen Koltun. Deep equilibrium models. InAdvances in Neural Information Processing Systems 32, pages 688–699, 2019

  2. [2]

    Princeton University Press, 1957

    Richard Bellman.Dynamic Programming. Princeton University Press, 1957

  3. [3]

    Bertsekas

    Dimitri P. Bertsekas. Distributed asynchronous computation of fixed points.Mathematical Programming, 27:107–120, 1983

  4. [4]

    Bertsekas and John N

    Dimitri P. Bertsekas and John N. Tsitsiklis.Parallel and Distributed Computation: Numerical Methods. Prentice-Hall, 1989

  5. [5]

    Finite time analysis of temporal differ- ence learning with linear function approximation

    Jalaj Bhandari, Daniel Russo, and Raghav Singal. Finite time analysis of temporal differ- ence learning with linear function approximation. In S ´ebastien Bubeck, Vianney Perchet, and Philippe Rigollet, editors,Proceedings of the 31st Conference On Learning Theory, volume 75 ofProceedings of Machine Learning Research, pages 1691–1692. PMLR, 06–09 Jul 2018

  6. [6]

    Borkar.Stochastic Approximation: A Dynamical Systems Viewpoint

    Vivek S. Borkar.Stochastic Approximation: A Dynamical Systems Viewpoint. Hindustan Book Agency and Cambridge University Press, 2008

  7. [7]

    Gossip algorithms: De- sign, analysis and applications

    Stephen Boyd, Arpita Ghosh, Balaji Prabhakar, and Devavrat Shah. Gossip algorithms: De- sign, analysis and applications. InProceedings IEEE 24th Annual Joint Conference of the IEEE Computer and Communications Societies., volume 3, pages 1653–1664. IEEE, 2005

  8. [8]

    Randomized gossip algo- rithms.IEEE Transactions on Information Theory, 52(6):2508–2530, 2006

    Stephen Boyd, Arpita Ghosh, Balaji Prabhakar, and Devavrat Shah. Randomized gossip algo- rithms.IEEE Transactions on Information Theory, 52(6):2508–2530, 2006

  9. [9]

    Distributed op- timization and statistical learning via the alternating direction method of multipliers.Founda- tions and Trends in Machine Learning, 3(1):1–122, 2011

    Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, and Jonathan Eckstein. Distributed op- timization and statistical learning via the alternating direction method of multipliers.Founda- tions and Trends in Machine Learning, 3(1):1–122, 2011

  10. [10]

    Matthew Weinberg, Tom Griffiths, and Sergey Levine

    Michael Chang, Sid Kaushik, S. Matthew Weinberg, Tom Griffiths, and Sergey Levine. De- centralized reinforcement learning: Global decision-making via local economic transactions. InProceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 1437–1447. PMLR, 2020. 10

  11. [11]

    Maxime Chevalier-Boisvert, Bolun Dai, Mark Towers, Rodrigo Perez-Vicente, Lucas Willems, Salem Lahlou, Suman Pal, Pablo Samuel Castro, and J. K. Terry. MiniGrid & MiniWorld: Modular & customizable reinforcement learning environments for goal-oriented tasks.Ad- vances in Neural Information Processing Systems, 36:73383–73394, 2023

  12. [12]

    Christofides, Riccardo Scattolini, David Mu˜noz de la Pe˜na, and Jinfeng Liu

    Panagiotis D. Christofides, Riccardo Scattolini, David Mu˜noz de la Pe˜na, and Jinfeng Liu. Dis- tributed model predictive control: A tutorial review and future research directions.Computers & Chemical Engineering, 51:21–41, 2013

  13. [13]

    Doan, Siva Theja Maguluri, and Justin Romberg

    Thinh T. Doan, Siva Theja Maguluri, and Justin Romberg. Finite-time analysis of distributed td(0) with linear function approximation for multi-agent reinforcement learning. InProceed- ings of the 36th International Conference on Machine Learning, pages 1626–1635, 2019

  14. [14]

    Laurent El Ghaoui, Fangda Gu, Bertrand Travacca, Armin Askari, and Alicia Y . Tsai. Implicit deep learning.SIAM Journal on Mathematics of Data Science, 3(3):930–958, 2021

  15. [15]

    Foerster, Gregory Farquhar, Triantafyllos Afouras, Nantas Nardelli, and Shimon Whiteson

    Jakob N. Foerster, Gregory Farquhar, Triantafyllos Afouras, Nantas Nardelli, and Shimon Whiteson. Counterfactual multi-agent policy gradients. InProceedings of the AAAI Con- ference on Artificial Intelligence, volume 32, 2018

  16. [16]

    Osher, and Wotao Yin

    Samy Wu Fung, Howard Heaton, Qiuwei Li, Daniel McKenzie, Stanley J. Osher, and Wotao Yin. Jfb: Jacobian-free backpropagation for implicit networks. InProceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 6648–6656, 2022

  17. [17]

    David F. Gleich. Pagerank beyond the web.SIAM Review, 57(3):321–363, 2015

  18. [18]

    Distributed randomized algorithms for the pagerank com- putation.IEEE Transactions on Automatic Control, 55(9):1987–2002, 2010

    Hideaki Ishii and Roberto Tempo. Distributed randomized algorithms for the pagerank com- putation.IEEE Transactions on Automatic Control, 55(9):1987–2002, 2010

  19. [19]

    Jordan, and Satinder P

    Tommi Jaakkola, Michael I. Jordan, and Satinder P. Singh. On the convergence of stochastic iterative dynamic programming algorithms.Neural Computation, 6(6):1185–1201, 1994

  20. [20]

    Soummya Kar, Jos ´e M. F. Moura, and H. Vincent Poor.QD-learning: A collaborative distributed strategy for multi-agent reinforcement learning through consensus + innovations. IEEE Transactions on Signal Processing, 61(7):1848–1862, 2013

  21. [21]

    Kushner and G

    Harold J. Kushner and G. George Yin. Asymptotic properties of distributed and communicating stochastic approximation algorithms.SIAM Journal on Control and Optimization, 25(5):1266– 1290, 1987

  22. [22]

    Kushner and G

    Harold J. Kushner and G. George Yin.Stochastic Approximation and Recursive Algorithms and Applications. Springer, 2003

  23. [23]

    On the schwarz alternating method

    Pierre-Louis Lions. On the schwarz alternating method. i. InFirst International Symposium on Domain Decomposition Methods for Partial Differential Equations, pages 1–42. SIAM, 1988

  24. [24]

    Multi-agent actor-critic for mixed cooperative-competitive environments

    Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, Pieter Abbeel, and Igor Mordatch. Multi-agent actor-critic for mixed cooperative-competitive environments. InAdvances in Neural Informa- tion Processing Systems 30, pages 6379–6390, 2017

  25. [25]

    Rusu, Joel Veness, Marc G

    V olodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Pe- tersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis. Human-level control through deep rein- forcem...

  26. [26]

    Distributed subgradient methods for multi-agent opti- mization.IEEE Transactions on Automatic Control, 54(1):48–61, 2009

    Angelia Nedi’c and Asuman Ozdaglar. Distributed subgradient methods for multi-agent opti- mization.IEEE Transactions on Automatic Control, 54(1):48–61, 2009

  27. [27]

    A splitting method for opti- mal control.IEEE Transactions on Control Systems Technology, 21(6):2432–2442, 2013

    Brendan O’Donoghue, Giorgos Stathopoulos, and Stephen Boyd. A splitting method for opti- mal control.IEEE Transactions on Control Systems Technology, 21(6):2432–2442, 2013

  28. [28]

    The PageRank citation ranking: Bringing order to the web

    Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The PageRank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project, 1998. 11

  29. [29]

    Oxford University Press, 1999

    Alfio Quarteroni and Alberto Valli.Domain Decomposition Methods for Partial Differential Equations. Oxford University Press, 1999

  30. [30]

    QMIX: Monotonic value function factorisation for deep multi-agent reinforcement learning

    Tabish Rashid, Mikayel Samvelyan, Christian Schroeder, Gregory Farquhar, Jakob Foerster, and Shimon Whiteson. QMIX: Monotonic value function factorisation for deep multi-agent reinforcement learning. In Jennifer Dy and Andreas Krause, editors,Proceedings of the 35th International Conference on Machine Learning, volume 80 ofProceedings of Machine Learn- in...

  31. [31]

    Rawlings, David Q

    James B. Rawlings, David Q. Mayne, and Moritz M. Diehl.Model Predictive Control: Theory, Computation, and Design. Nob Hill Publishing, 2017

  32. [32]

    A stochastic approximation method.The Annals of Math- ematical Statistics, 22(3):400–407, 1951

    Herbert Robbins and Sutton Monro. A stochastic approximation method.The Annals of Math- ematical Statistics, 22(3):400–407, 1951

  33. [33]

    Rummery and Mahesan Niranjan

    Gavin A. Rummery and Mahesan Niranjan. On-line q-learning using connectionist systems. Technical Report CUED/F-INFENG/TR 166, University of Cambridge, Department of Engi- neering, 1994

  34. [34]

    Architectures for distributed and hierarchical model predictive control–a review.Journal of Process Control, 19(5):723–731, 2009

    Riccardo Scattolini. Architectures for distributed and hierarchical model predictive control–a review.Journal of Process Control, 19(5):723–731, 2009

  35. [35]

    Model predictive control (mpc) for enhancing building and hvac system energy efficiency: Problem formulation, applications and opportunities.Energies, 11(3):631, 2018

    Gianluca Serale, Massimo Fiorentini, Alfonso Capozzoli, Daniele Bernardini, and Alberto Bemporad. Model predictive control (mpc) for enhancing building and hvac system energy efficiency: Problem formulation, applications and opportunities.Energies, 11(3):631, 2018

  36. [36]

    Smith, Petter E

    Barry F. Smith, Petter E. Bjørstad, and William D. Gropp.Domain Decomposition: Parallel Multilevel Methods for Elliptic Partial Differential Equations. Cambridge University Press, 1996

  37. [37]

    Finite-time error bounds for linear stochastic approxi- mation and td learning

    Rayadurgam Srikant and Lei Ying. Finite-time error bounds for linear stochastic approxi- mation and td learning. In Alina Beygelzimer and Daniel Hsu, editors,Proceedings of the Thirty-Second Conference on Learning Theory, volume 99 ofProceedings of Machine Learn- ing Research, pages 2803–2830. PMLR, 25–28 Jun 2019

  38. [38]

    Stankovi ´c, Nemanja Ili ´c, and Srdjan S

    Milo ˇs S. Stankovi ´c, Nemanja Ili ´c, and Srdjan S. Stankovi ´c. Distributed stochastic approxi- mation: Weak convergence and network design.IEEE Transactions on Automatic Control, 61(12):4069–4074, 2016

  39. [39]

    Stankovi ´c, Miloˇs S

    Srdjan S. Stankovi ´c, Miloˇs S. Stankovi´c, and Duˇsan M. Stipanovi´c. Decentralized parameter estimation by consensus based stochastic approximation.IEEE Transactions on Automatic Control, 56(3):531–543, 2011

  40. [40]

    Osqp: An operator splitting solver for quadratic programs.Mathematical Programming Com- putation, 12:637–672, 2020

    Bartolomeo Stellato, Goran Banjac, Paul Goulart, Alberto Bemporad, and Stephen Boyd. Osqp: An operator splitting solver for quadratic programs.Mathematical Programming Com- putation, 12:637–672, 2020

  41. [41]

    Understanding representation of deep equilibrium models from neural collapse perspective.Advances in Neural Information Processing Systems, 37:9634– 9667, 2024

    Haixiang Sun and Ye Shi. Understanding representation of deep equilibrium models from neural collapse perspective.Advances in Neural Information Processing Systems, 37:9634– 9667, 2024

  42. [42]

    Giannakis, Qinmin Yang, and Zaiyue Yang

    Jun Sun, Gang Wang, Georgios B. Giannakis, Qinmin Yang, and Zaiyue Yang. Finite-time analysis of decentralized temporal-difference learning with linear function approximation. In Silvia Chiappa and Roberto Calandra, editors,Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, volume 108 ofProceedings of Machi...

  43. [43]

    Leibo, Karl Tuyls, and Thore Graepel

    Peter Sunehag, Guy Lever, Audrunas Gruslys, Wojciech Marian Czarnecki, Vinicius Zambaldi, Max Jaderberg, Marc Lanctot, Nicolas Sonnerat, Joel Z. Leibo, Karl Tuyls, and Thore Graepel. Value-decomposition networks for cooperative multi-agent learning based on team reward. InProceedings of the 17th International Conference on Autonomous Agents and MultiAgent...

  44. [44]

    Richard S. Sutton. Learning to predict by the methods of temporal differences.Machine Learning, 3(1):9–44, 1988

  45. [45]

    Multi-agent reinforcement learning: independent versus cooperative agents

    Ming Tan. Multi-agent reinforcement learning: independent versus cooperative agents. In Proceedings of the Tenth International Conference on International Conference on Machine Learning, ICML’93, pages 330–337, San Francisco, CA, USA, 1993. Morgan Kaufmann Pub- lishers Inc

  46. [46]

    Petting- zoo: Gym for multi-agent reinforcement learning.Advances in Neural Information Processing Systems, 34:15032–15043, 2021

    J Terry, Benjamin Black, Nathaniel Grammel, Mario Jayakumar, Ananth Hari, Ryan Sullivan, Luis S Santos, Clemens Dieffendahl, Caroline Horsch, Rodrigo Perez-Vicente, et al. Petting- zoo: Gym for multi-agent reinforcement learning.Advances in Neural Information Processing Systems, 34:15032–15043, 2021

  47. [47]

    Widlund.Domain Decomposition Methods: Algorithms and The- ory

    Andrea Toselli and Olof B. Widlund.Domain Decomposition Methods: Algorithms and The- ory. Springer, 2005

  48. [48]

    Tsitsiklis

    John N. Tsitsiklis. Asynchronous stochastic approximation and q-learning.Machine Learning, 16(3):185–202, 1994

  49. [49]

    A theoretical and empirical analysis of expected SARSA

    Harm van Seijen, Hado van Hasselt, Shimon Whiteson, and Marco Wiering. A theoretical and empirical analysis of expected SARSA. InProceedings of the IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, pages 177–184. IEEE, 2009

  50. [50]

    Giannakis, Gerald Tesauro, and Jian Sun

    Gang Wang, Songtao Lu, Georgios B. Giannakis, Gerald Tesauro, and Jian Sun. Decentralized TD tracking with linear function approximation and its finite-time analysis. InAdvances in Neural Information Processing Systems, volume 33, 2020

  51. [51]

    Christopher J. C. H. Watkins and Peter Dayan. Q-learning.Machine Learning, 8(3–4):279– 292, 1992

  52. [52]

    Gayah, Kai Xu, and Zhenhui Li

    Hua Wei, Chacha Chen, Guanjie Zheng, Kan Wu, Vikash V . Gayah, Kai Xu, and Zhenhui Li. Presslight: Learning max pressure control to coordinate traffic signals in arterial network. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 1290–1298. ACM, 2019

  53. [53]

    Colight: Learning network-level cooperation for traffic signal control

    Hua Wei, Nan Xu, Huichu Zhang, Guanjie Zheng, Xinshi Zang, Chacha Chen, Weinan Zhang, Yanmin Zhu, Kai Xu, and Zhenhui Li. Colight: Learning network-level cooperation for traffic signal control. InProceedings of the 28th ACM International Conference on Information and Knowledge Management, pages 1913–1922. ACM, 2019

  54. [54]

    Zico Kolter

    Ezra Winston and J. Zico Kolter. Monotone operator equilibrium networks. InAdvances in Neural Information Processing Systems 33, 2020

  55. [55]

    Fast linear iterations for distributed averaging.Systems & Control Letters, 53(1):65–78, 2004

    Lin Xiao and Stephen Boyd. Fast linear iterations for distributed averaging.Systems & Control Letters, 53(1):65–78, 2004

  56. [56]

    Doan, and Justin Romberg

    Sihan Zeng, Thinh T. Doan, and Justin Romberg. Finite-time convergence rates of decentral- ized stochastic approximation with applications in multi-agent and multi-task learning.IEEE Transactions on Automatic Control, 68(5):2758–2773, 2023

  57. [57]

    Decentralized multi-agent reinforcement learning with networked agents: Recent advances.Frontiers of Information Technology & Electronic Engineering, 22(6):802–814, 2021

    Kaiqing Zhang, Zhuoran Yang, and Tamer Bas ¸ar. Decentralized multi-agent reinforcement learning with networked agents: Recent advances.Frontiers of Information Technology & Electronic Engineering, 22(6):802–814, 2021

  58. [58]

    Fully decentralized multi-agent reinforcement learning with networked agents

    Kaiqing Zhang, Zhuoran Yang, Han Liu, Tong Zhang, and Tamer Bas ¸ar. Fully decentralized multi-agent reinforcement learning with networked agents. InProceedings of the 35th Inter- national Conference on Machine Learning, pages 5872–5881, 2018. 13 A Related Works Stochastic approximation, Bellman fixed points, and implicit fixed-point models.The fixed- poi...

  59. [59]

    The same mean-operator perspective covers TD learning [44], SARSA [33], Expected SARSA [49], and finite-sample analyses of linear SA and TD learning [5, 37]

    and the stochastic-approximation interpretation of TD and value iteration give the template for Bellman fixed-point algorithms; Watkins and Dayan [51] introduced Q-learning, and Tsitsiklis [48] together with Jaakkola, Jordan, and Singh [19] established convergence for asynchronous stochastic approximation and iterative dynamic programming. The same mean-o...

  60. [60]

    sparse linear fixed-point solvers + DEQ- type implicit layers

    analyzed fully decentralized learning with networked agents, [13] and [42] gave finite-time analyses for distributed TD, [50] developed decentralized TD tracking, [56] established finite-time theory for decentralized SA with fixed points. [57] surveys the broader area. The second line is the multi-agent RL benchmarks and baselines used in our experiments:...