pith. machine review for the scientific record. sign in

arxiv: 2605.01748 · v1 · submitted 2026-05-03 · 💻 cs.NI

Recognition: unknown

GATE: GPU-Accelerated Traffic Engineering for the WAN

Alexander Krentsel, Brighten Godfrey, Rahul Bothra, Rob Shakir, R. Srikant, Saptarshi Mandal, Sylvia Ratnasamy

Authors on Pith no claims yet

Pith reviewed 2026-05-09 16:48 UTC · model grok-4.3

classification 💻 cs.NI
keywords traffic engineeringGPU accelerationwide area networksoptimizationroutingfairness objectivesconvergence
0
0 comments X

The pith

GATE uses GPU-parallel decomposition to solve traffic engineering problems optimally up to 10 times faster than prior methods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents GATE, a system that accelerates traffic engineering in wide-area networks by breaking the optimization problem into parts that GPUs can compute in parallel. It shows that this approach not only runs much faster but still reaches solutions that are provably optimal or near-optimal for various fairness goals. For network operators managing large clouds, this means they can adjust routes quickly when demands change without sacrificing efficiency. The evaluation on real traces from big WANs demonstrates the speedup.

Core claim

GATE achieves near-optimal solutions 5-10x faster than state-of-the-art by using a highly-parallelizable GPU-compatible decomposition that iteratively converges to the provably optimal solution for a wide spectrum of fairness objectives.

What carries the argument

The GPU-compatible decomposition of the traffic engineering optimization problem, which allows parallel computation while ensuring convergence.

If this is right

  • Network operators can respond to demand shifts or failures in seconds instead of longer runtimes.
  • The same solver handles multiple fairness objectives without redesign.
  • Larger networks gain more from the parallelism, improving scalability.
  • Theoretical convergence bounds allow safe use of near-optimal results after limited iterations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The decomposition technique could extend to other network resource allocation problems that currently rely on slow solvers.
  • Real-time TE might become standard in highly dynamic environments like edge computing.
  • Production systems could reduce dependence on heuristic approximations for critical routing decisions.

Load-bearing premise

The traffic engineering optimization problem admits a decomposition that is highly parallelizable on GPUs and that the iterative method converges to the optimal solution for a wide spectrum of fairness objectives.

What would settle it

A case where GATE's decomposition fails to converge to optimal or where its GPU implementation shows no significant speedup over CPU methods on production-scale networks.

Figures

Figures reproduced from arXiv: 2605.01748 by Alexander Krentsel, Brighten Godfrey, Rahul Bothra, Rob Shakir, R. Srikant, Saptarshi Mandal, Sylvia Ratnasamy.

Figure 1
Figure 1. Figure 1: Steps in one iteration of GATE. All blocks within a step are computed parallelly. Each steps is computed sequentially. values such that no flow deviates disproportionately from their current rates xr . The mathematical representation of these two considerations reflects in Eq. (12), and the rate suggestions is computed in Eq. (13), where ne is the number of paths traversing edge e. All |E| ∗|P| instances o… view at source ↗
Figure 2
Figure 2. Figure 2: Optimality (relative to max-min fair) of α-utility functions with increasing α. 0 2 4 6 8 10 Rates allocated (x) 0 5 10 O bje c tiv e U(x) α = 0 (Max throughput) α = 0.5 α = 2 α = 10 ( → max-min fair) α = 0.25 α = 1 α = 3 view at source ↗
Figure 3
Figure 3. Figure 3: Curve of various α-utility objective functions (larger α swings more). α-increments Direct optimization of the α-fair utility func￾tion (Eq. (4)) for large values of α introduces numerical stiff￾ness. When α ≫ 1, the utility curve becomes exceedingly sharp, as shown in view at source ↗
Figure 5
Figure 5. Figure 5: Drift Adjusted Optimality for different TE algorithms over WAN B. 0.70 0.75 0.80 0.85 0.90 0.95 1.00 Drift Adjusted Optimality (DAO) 0 25 50 75 100 CDF across snapshots SWAN Soroush k-Waterfill Danna GATE view at source ↗
Figure 7
Figure 7. Figure 7: Runtime and Optimality for TE algo￾rithms over Cogentco with uniform link capaci￾ties. Abilene (11) GEANT (23) ESNet (68) Cogentco (197) 10 0 10 1 10 2 10 3 Runtime (s) Exact (Proportional Fairness) DeDe (Proportional Fairness) GATE (Proportional Fairness) view at source ↗
Figure 9
Figure 9. Figure 9: Runtimes of TE algorithms over a pe￾riod of 3 years as WAN B grows in number of routers. Abilene (11) GEANT (23) ESNet (68) Cogentco (197) WAN B (O(1000)) 10 −1 10 0 10 1 10 2 10 3 Runtime (s) Danna GATE k-Waterfill SWAN Soroush view at source ↗
Figure 12
Figure 12. Figure 12: GATE’s intermediate values with WAN A as input – (a) top: optimality after projection and learning rates, (b) middle: Capacity violations (without projection) and, (c) bottom: Demand violations (without projection). ~1.11x. In that same period, the number of iterations required to converge increases by ~1.45x of the original. GATE outperforms other SOTA TE on all but the smallest (sub-30 router) topologie… view at source ↗
Figure 13
Figure 13. Figure 13: GATE runtime with increasing GPU threads A Evaluation Continued Total flow served by GATE is at-par with other TE solu￾tions. While the evaluated TE solutions optimize for max-min fairness, we also review the total flow served by these algo￾rithms view at source ↗
read the original abstract

Traffic engineering (TE) has become a crucial tool for enforcing routing policy and maintaining operational efficiency in large networks. Existing TE solutions pick an objective function to optimize, aiming to balance (i) allocating traffic optimally with (ii) reacting quickly to demand changes and disruption events. However, as the scale of networks grows, the runtime of the existing optimal solution becomes infeasibly large. The alternative - approximate solvers - result in costly inefficiencies. We present GPU-Accelerated Traffic Engineering (GATE), which achieves the best of both worlds: enabling fast TE runtimes through a highly-parallelizable GPU-compatible decomposition, while iteratively converging to the provably optimal solution. GATE unlocks a unique set of desirable properties: it becomes increasingly parallelizable with network size, supports a wide spectrum of fairness objectives, and offers theoretically guaranteed convergence to the optimal solution and near-optimal convergence within a bounded time. We evaluate GATE on production traces from two large cloud WANs, and show that GATE achieves near-optimal solutions 5-10x faster than state-of-the-art.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces GATE, a GPU-accelerated decomposition algorithm for traffic engineering (TE) in wide-area networks. It claims to enable highly parallelizable computation on GPUs while iteratively converging to the provably optimal solution for a wide spectrum of fairness objectives, with theoretical convergence guarantees and near-optimal performance within bounded time. Evaluation on production traces from two large cloud WANs reports 5-10x speedups over state-of-the-art solvers.

Significance. If the decomposition and convergence properties hold, GATE would meaningfully advance scalable TE by closing the gap between slow exact solvers and fast but suboptimal approximations in large cloud networks. The use of real production traces from two WANs is a positive aspect that grounds the performance claims.

major comments (2)
  1. [Abstract and §3] Abstract and §3 (method): The claim of convergence to the 'provably optimal solution' for a 'wide spectrum of fairness objectives' rests on unstated separability and convexity assumptions in the decomposition. Standard formulations with alpha-fairness (alpha ≠ 1), max-min, or lex-max objectives are nonlinear and couple flows across paths; without explicit conditions showing that the chosen blocks remain convex and the fixed point is the global optimum (rather than a stationary point of a relaxation), the guarantee does not hold for arbitrary objectives or dense traffic matrices.
  2. [Abstract and §5] Abstract and §5 (evaluation): The 5-10x speedup to 'near-optimal' solutions is reported on production traces, but the abstract provides no details on iteration counts, optimality gaps, baseline solvers, or error analysis. This makes it impossible to assess whether the speedup is independent of the fairness parameter or holds only for specific cases where the decomposition converges quickly.
minor comments (2)
  1. [§2] Notation for the fairness objectives and the decomposition blocks should be introduced earlier and used consistently to aid readability.
  2. [Abstract] The abstract mentions 'theoretically guaranteed convergence' but does not reference the specific theorem or proof sketch; adding a forward pointer would help.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment point by point below and will revise the manuscript to incorporate clarifications and additional details where appropriate.

read point-by-point responses
  1. Referee: [Abstract and §3] Abstract and §3 (method): The claim of convergence to the 'provably optimal solution' for a 'wide spectrum of fairness objectives' rests on unstated separability and convexity assumptions in the decomposition. Standard formulations with alpha-fairness (alpha ≠ 1), max-min, or lex-max objectives are nonlinear and couple flows across paths; without explicit conditions showing that the chosen blocks remain convex and the fixed point is the global optimum (rather than a stationary point of a relaxation), the guarantee does not hold for arbitrary objectives or dense traffic matrices.

    Authors: We appreciate the referee pointing out the need for explicit assumptions. Section 3 of the manuscript defines the block decomposition and states that convergence holds when the objective is separable across blocks and each subproblem is convex. For the supported fairness objectives (including alpha-fairness for alpha > 0, max-min, and lex-max), the utility functions are concave and the flow conservation constraints are linear, preserving convexity of the blocks. The fixed point of the iterative procedure coincides with the global optimum of the original problem under these conditions. We will revise §3 to explicitly enumerate the separability and convexity requirements, add a short proof sketch confirming the fixed point is the global optimum (not merely a stationary point of a relaxation), and clarify that the guarantee applies to the spectrum of objectives for which these properties hold rather than arbitrary nonlinear objectives. revision: yes

  2. Referee: [Abstract and §5] Abstract and §5 (evaluation): The 5-10x speedup to 'near-optimal' solutions is reported on production traces, but the abstract provides no details on iteration counts, optimality gaps, baseline solvers, or error analysis. This makes it impossible to assess whether the speedup is independent of the fairness parameter or holds only for specific cases where the decomposition converges quickly.

    Authors: We agree that additional quantitative details in the abstract and evaluation section would improve clarity. In the revised manuscript we will expand §5 to report observed iteration counts, measured optimality gaps relative to exact solvers, the specific baseline solvers employed, and an analysis of performance variation across fairness parameters. The abstract will be updated with a concise statement summarizing these aspects. This will allow readers to verify that the reported speedups are consistent across the tested range of fairness objectives on the production traces. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation relies on algorithmic decomposition and external evaluation

full rationale

The paper describes GATE as a GPU-parallelizable decomposition of the TE problem with claimed theoretical convergence guarantees to the optimal solution for a range of fairness objectives. No equations or steps in the provided abstract reduce a prediction or optimality claim to a fitted parameter, self-definition, or self-citation chain by construction. The speedup and near-optimality results are presented as outcomes of evaluation on production traces from two cloud WANs, which are independent of the derivation. This is consistent with a self-contained algorithmic contribution whose central claims do not collapse to renaming or fitting inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the assumption that standard convex optimization formulations of TE admit a parallelizable decomposition whose iterative updates converge to optimality; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (2)
  • domain assumption Traffic engineering can be formulated as an optimization problem that admits a highly parallelizable decomposition suitable for GPU execution.
    Core premise enabling the GPU acceleration and iterative convergence.
  • domain assumption The iterative method converges to the provably optimal solution for a wide spectrum of fairness objectives.
    Stated as a theoretical guarantee in the abstract.

pith-pipeline@v0.9.0 · 5502 in / 1251 out tokens · 24680 ms · 2026-05-09T16:48:40.695063+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

41 extracted references · 1 canonical work pages

  1. [1]

    https: //atscaleconference.com/events/ networking-scale-2024/, Jan

    Networking @scale 2024. https: //atscaleconference.com/events/ networking-scale-2024/, Jan. 2024. Accessed: 2026-2-6

  2. [2]

    Abuzaid, P

    F. Abuzaid, P. Bailis, and M. Zaharia. Solving large- scale granular resource allocation problems efficiently with pop. In Proceedings of the 28th ACM Symposium on Operating Systems Principles (SOSP ’21), pages 638–652, 2021

  3. [3]

    Abuzaid, S

    F. Abuzaid, S. Kandula, B. Arzani, I. Menache, M. Za- haria, and P. Bailis. Contracting wide-area network topologies to solve flow problems quickly. In 18th USENIX Symposium on Networked Systems Design and Implementation (NSDI 21), pages 175–192, 2021

  4. [4]

    Annamalai

    N. Annamalai. Azure networking updates on security, reliability, and high availability. https://azure.microsoft.com/en-us/blog/ azure-networking-updates-on-security-reliability-and-high-availability/ , Dec. 2025. Accessed: 2026-2-6

  5. [5]

    D. P. Bertsekas and R. G. Gallager. Data Networks. Prentice Hall, Englewood Cliffs, NJ, 3rd edition, 2009

  6. [6]

    S. Boyd, P. Neal, C. Eric, P. Borja, and E. Jonathan. Distributed optimization and statistical learning via the alternating direction method of multipliers.Foundations and Trends® in Machine learning, 3(1):1–122, 2011

  7. [7]

    Danna, A

    E. Danna, A. Hassidim, H. Kaplan, A. Kumar, Y . Man- sour, D. Raz, and M. Segalov. Upward max-min fairness. Journal of the ACM (JACM), 64(1):1–24, 2017

  8. [8]

    Danna, S

    E. Danna, S. Mandal, and A. Singh. A practical algo- rithm for balancing the max-min fairness and throughput objectives in traffic engineering. In 2012 Proceedings IEEE INFOCOM, pages 846–854. IEEE, 2012

  9. [9]

    Data center it capex 5-year forecast report

    Dell’Oro Group. Data center it capex 5-year forecast report. Market research report, Dell’Oro Group, January

  10. [10]

    Cloud Capex to Surge as AI Infrastructure Matures

    Summarized in Light Reading: "Cloud Capex to Surge as AI Infrastructure Matures"

  11. [11]

    Denis, Y

    M. Denis, Y . Yao, A. Hatch, Q. Zhang, C. L. Lim, S. Zhang, K. Sugrue, H. Kwok, M. J. Fernandez, P. La- pukhov, et al. EBB: Reliable and evolvable express backbone network in meta. In Proceedings of the ACM SIGCOMM 2023 Conference, pages 346–359, 2023

  12. [12]

    Feamster, J

    N. Feamster, J. Rexford, and E. Zegura. The road to SDN: an intellectual history of programmable networks. SIGCOMM Comput. Commun. Rev., 44(2):87–98, Apr. 2014

  13. [13]

    J. He, M. Bresler, M. Chiang, and J. Rexford. To- wards robust multi-layer traffic engineering: Optimiza- tion of congestion control and routing. IEEE Journal on selected areas in communications, 25(5):868–880, 2007

  14. [14]

    M. R. Hestenes. Multiplier and gradient meth- ods. Journal of Optimization Theory and Applications, 4:303–320, 1969

  15. [15]

    C.-Y . Hong, S. Kandula, R. Mahajan, M. Zhang, V . Gill, M. Nanduri, and R. Wattenhofer. Achieving high uti- lization with software-driven wan. In Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM, pages 15–26. ACM, 2013

  16. [16]

    S. Jain, A. Kumar, S. Mandal, J. Ong, L. Poutievski, A. Singh, S. Venkata, J. Wanderer, J. Zhou, M. Zhu, J. Zolla, U. Hölzle, S. Stuart, and A. Vahdat. B4: Experi- ence with a globally deployed software defined wan. In Proceedings of the ACM SIGCOMM Conference, Hong Kong, China, 2013

  17. [17]

    L. Jose, S. Ibanez, M. Alizadeh, and N. McKeown. A distributed algorithm to calculate max-min fair rates without per-flow state. Proceedings of the ACM on Measurement and Analysis of Computing Systems, 3(2):1–25, 2019

  18. [18]

    Jurkiewicz

    P. Jurkiewicz. Topohub: A repository of reference gabriel graph and real-world topologies for networking research. SoftwareX, 24:101540, 2023. 13

  19. [19]

    F. P. Kelly, A. K. Maulloo, and D. K. H. Tan. Rate con- trol for communication networks: shadow prices, propor- tional fairness and stability. Journal of the Operational Research society, 49(3):237–252, 1998

  20. [20]

    Kikuta, E

    K. Kikuta, E. Oki, N. Yamanaka, N. Togawa, and H. Nakazato. Effective parallel algorithm for gpgpu- accelerated explicit routing optimization. In 2015 IEEE Global Communications Conference (GLOBECOM), pages 1–6, 2015

  21. [21]

    B. Koley. Google’s AI-powered next-gen global net- work: Built for the gemini era. Google Cloud Next 25, Apr. 2025

  22. [22]

    Krentsel, N

    A. Krentsel, N. Saran, B. Koley, S. Mandal, A. Narayanan, S. Ratnasamy, A. Al-Shabibi, A. Shaikh, R. Shakir, A. Singla, and H. Weatherspoon. A decen- tralized sdn architecture for the wan. In Proceedings of the ACM SIGCOMM 2024 Conference, ACM SIGCOMM ’24, page 938–953, New York, NY , USA,

  23. [23]

    Association for Computing Machinery

  24. [24]

    Krishnaswamy, R

    U. Krishnaswamy, R. Singh, P. Mattes, P.-A. C. Bis- sonnette, N. Bjørner, Z. Nasrin, S. Kothari, P. Reddy, J. Abeln, S. Kandula, H. Raj, L. Irun-Briz, J. Gaudette, and E. Lan. OneWAN is better than two: Unifying a split WAN architecture. In 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23), pages 515–529, Boston, MA, Apr. 2023...

  25. [25]

    S. Liao. On the homotopy analysis method for nonlin- ear problems. Applied Mathematics and Computation, 147(2):499–513, 2004

  26. [26]

    Lin and N

    X. Lin and N. B. Shroff. Utility maximization for com- munication networks with multipath routing. IEEE Transactions on Automatic Control, 51(5):766–781, 2006

  27. [27]

    S. H. Low and D. E. Lapsley. Optimization flow con- trol. i. basic algorithm and convergence. IEEE/ACM Transactions on networking, 7(6):861–874, 1999

  28. [28]

    Y . Lu, S. Kandula, A. C. König, and S. Chaudhuri. Pre- training summarization models of structured datasets for cardinality estimation. Proceedings of the VLDB Endowment, 15(3):414–426, 2021

  29. [29]

    Marcus, P

    R. Marcus, P. Negi, H. Mao, C. Zhang, M. Al- izadeh, T. Kraska, O. Papaemmanouil, and N. Tat- bul. Neo: A learned query optimizer. arXiv preprint arXiv:1904.03711, 2019

  30. [30]

    Nace and M

    D. Nace and M. Pióro. Max-min fairness and its appli- cations to routing and load-balancing in communication networks: a tutorial. IEEE Communications Surveys & Tutorials, 10(4):5–17, 2009

  31. [31]

    Namyar, B

    P. Namyar, B. Arzani, S. Kandula, S. Segarra, D. Crankshaw, U. Krishnaswamy, R. Govindan, and H. Raj. Solving max-min fair resource allocations quickly on large graphs. In 21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24), pages 1937–1958, 2024

  32. [32]

    Orlowski, R

    S. Orlowski, R. Wessäly, M. Pióro, and A. Tomaszewski. Sndlib 1.0—survivable network design library. Networks, 55(3):276–286, 2010

  33. [33]

    Perry, F

    Y . Perry, F. V . Frujeri, C. Hoch, S. Kandula, I. Men- ache, M. Schapira, and A. Tamar. {DOTE}: Rethink- ing (predictive) {WAN} traffic engineering. In 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23), pages 1557–1581, 2023

  34. [34]

    Roughan, A

    M. Roughan, A. Greenberg, C. Kalmanek, M. Rum- sewicz, J. Yates, and Y . Zhang. Experience in measuring backbone traffic variability: models, metrics, measure- ments and meaning. In Proceedings of the 2nd ACM SIGCOMM Workshop on Internet Measurment, IMW ’02, page 91–92, New York, NY , USA, 2002. Associa- tion for Computing Machinery

  35. [35]

    R. Srikant. The mathematics of Internet congestion control. Springer, 2004

  36. [36]

    Srikant and L

    R. Srikant and L. Ying. Communication networks: An optimization, control and stochastic networks perspective. Cambridge University Press, 2014

  37. [37]

    Stoer and R

    J. Stoer and R. Bulirsch. Introduction to Numerical Analysis, volume 12 of Texts in Applied Mathematics. Springer-Verlag, New York, 2002

  38. [38]

    X. Wang, Q. Zhang, J. Ren, S. Xu, S. Wang, and S. Yu. Toward efficient parallel routing optimization for large- scale sdn networks using gpgpu. Journal of Network and Computer Applications, 113:1–13, 2018

  39. [39]

    Z. Xu, F. Y . Yan, R. Singh, J. T. Chiu, A. M. Rush, and M. Yu. Teal: Learning-accelerated optimization of wan traffic engineering. In Proceedings of the ACM SIGCOMM 2023 Conference, pages 313–328, 2023

  40. [40]

    Z. Xu, M. Yu, and F. Y . Yan. Decouple and decompose: Scaling resource allocation with dede. In Proceedings of the 19th USENIX Symposium on Operating Systems Design and Implementation (OSDI 25). USENIX Asso- ciation, July 2025

  41. [41]

    dual k demst β + Dst − ∑ r∈Pst xk r !# + (19) slackk+1 cape =

    F. Zhang and S. Boyd. Solving large multicommod- ity network flow problems on GPUs. Automation and Remote Control, 86(8):782–795, 2025. 14 Max-flow Danna GATE k-Waterfill SWAN Soroush0.80 0.85 0.90 0.95 1.00 1.05Demand served w.rt. Danna Max-flow Danna GATE k-Waterfill SWAN Soroush Figure 13:GATE runtime with increasing GPU threads A Evaluation Continued ...