arxiv: 2605.01748 · v1 · submitted 2026-05-03 · 💻 cs.NI

Recognition: unknown

GATE: GPU-Accelerated Traffic Engineering for the WAN

Alexander Krentsel, Brighten Godfrey, Rahul Bothra, Rob Shakir, R. Srikant, Saptarshi Mandal, Sylvia Ratnasamy

Authors on Pith no claims yet

Pith reviewed 2026-05-09 16:48 UTC · model grok-4.3

classification 💻 cs.NI

keywords traffic engineeringGPU accelerationwide area networksoptimizationroutingfairness objectivesconvergence

0 comments

The pith

GATE uses GPU-parallel decomposition to solve traffic engineering problems optimally up to 10 times faster than prior methods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents GATE, a system that accelerates traffic engineering in wide-area networks by breaking the optimization problem into parts that GPUs can compute in parallel. It shows that this approach not only runs much faster but still reaches solutions that are provably optimal or near-optimal for various fairness goals. For network operators managing large clouds, this means they can adjust routes quickly when demands change without sacrificing efficiency. The evaluation on real traces from big WANs demonstrates the speedup.

Core claim

GATE achieves near-optimal solutions 5-10x faster than state-of-the-art by using a highly-parallelizable GPU-compatible decomposition that iteratively converges to the provably optimal solution for a wide spectrum of fairness objectives.

What carries the argument

The GPU-compatible decomposition of the traffic engineering optimization problem, which allows parallel computation while ensuring convergence.

If this is right

Network operators can respond to demand shifts or failures in seconds instead of longer runtimes.
The same solver handles multiple fairness objectives without redesign.
Larger networks gain more from the parallelism, improving scalability.
Theoretical convergence bounds allow safe use of near-optimal results after limited iterations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The decomposition technique could extend to other network resource allocation problems that currently rely on slow solvers.
Real-time TE might become standard in highly dynamic environments like edge computing.
Production systems could reduce dependence on heuristic approximations for critical routing decisions.

Load-bearing premise

The traffic engineering optimization problem admits a decomposition that is highly parallelizable on GPUs and that the iterative method converges to the optimal solution for a wide spectrum of fairness objectives.

What would settle it

A case where GATE's decomposition fails to converge to optimal or where its GPU implementation shows no significant speedup over CPU methods on production-scale networks.

Figures

Figures reproduced from arXiv: 2605.01748 by Alexander Krentsel, Brighten Godfrey, Rahul Bothra, Rob Shakir, R. Srikant, Saptarshi Mandal, Sylvia Ratnasamy.

**Figure 1.** Figure 1: Steps in one iteration of GATE. All blocks within a step are computed parallelly. Each steps is computed sequentially. values such that no flow deviates disproportionately from their current rates xr . The mathematical representation of these two considerations reflects in Eq. (12), and the rate suggestions is computed in Eq. (13), where ne is the number of paths traversing edge e. All |E| ∗|P| instances o… view at source ↗

**Figure 2.** Figure 2: Optimality (relative to max-min fair) of α-utility functions with increasing α. 0 2 4 6 8 10 Rates allocated (x) 0 5 10 O bje c tiv e U(x) α = 0 (Max throughput) α = 0.5 α = 2 α = 10 ( → max-min fair) α = 0.25 α = 1 α = 3 view at source ↗

**Figure 3.** Figure 3: Curve of various α-utility objective functions (larger α swings more). α-increments Direct optimization of the α-fair utility function (Eq. (4)) for large values of α introduces numerical stiffness. When α ≫ 1, the utility curve becomes exceedingly sharp, as shown in view at source ↗

**Figure 5.** Figure 5: Drift Adjusted Optimality for different TE algorithms over WAN B. 0.70 0.75 0.80 0.85 0.90 0.95 1.00 Drift Adjusted Optimality (DAO) 0 25 50 75 100 CDF across snapshots SWAN Soroush k-Waterfill Danna GATE view at source ↗

**Figure 7.** Figure 7: Runtime and Optimality for TE algorithms over Cogentco with uniform link capacities. Abilene (11) GEANT (23) ESNet (68) Cogentco (197) 10 0 10 1 10 2 10 3 Runtime (s) Exact (Proportional Fairness) DeDe (Proportional Fairness) GATE (Proportional Fairness) view at source ↗

**Figure 9.** Figure 9: Runtimes of TE algorithms over a period of 3 years as WAN B grows in number of routers. Abilene (11) GEANT (23) ESNet (68) Cogentco (197) WAN B (O(1000)) 10 −1 10 0 10 1 10 2 10 3 Runtime (s) Danna GATE k-Waterfill SWAN Soroush view at source ↗

**Figure 12.** Figure 12: GATE’s intermediate values with WAN A as input – (a) top: optimality after projection and learning rates, (b) middle: Capacity violations (without projection) and, (c) bottom: Demand violations (without projection). ~1.11x. In that same period, the number of iterations required to converge increases by ~1.45x of the original. GATE outperforms other SOTA TE on all but the smallest (sub-30 router) topologie… view at source ↗

**Figure 13.** Figure 13: GATE runtime with increasing GPU threads A Evaluation Continued Total flow served by GATE is at-par with other TE solutions. While the evaluated TE solutions optimize for max-min fairness, we also review the total flow served by these algorithms view at source ↗

read the original abstract

Traffic engineering (TE) has become a crucial tool for enforcing routing policy and maintaining operational efficiency in large networks. Existing TE solutions pick an objective function to optimize, aiming to balance (i) allocating traffic optimally with (ii) reacting quickly to demand changes and disruption events. However, as the scale of networks grows, the runtime of the existing optimal solution becomes infeasibly large. The alternative - approximate solvers - result in costly inefficiencies. We present GPU-Accelerated Traffic Engineering (GATE), which achieves the best of both worlds: enabling fast TE runtimes through a highly-parallelizable GPU-compatible decomposition, while iteratively converging to the provably optimal solution. GATE unlocks a unique set of desirable properties: it becomes increasingly parallelizable with network size, supports a wide spectrum of fairness objectives, and offers theoretically guaranteed convergence to the optimal solution and near-optimal convergence within a bounded time. We evaluate GATE on production traces from two large cloud WANs, and show that GATE achieves near-optimal solutions 5-10x faster than state-of-the-art.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

GATE gives a GPU decomposition for WAN traffic engineering that runs 5-10x faster on real traces while claiming convergence to optimal for multiple fairness goals, though the math on broad applicability needs checking.

read the letter

The main takeaway is that this paper shows how to split the traffic engineering problem into GPU-friendly pieces that scale with network size and still reach near-optimal solutions quickly. They tested it on production traces from two large cloud WANs and report consistent speedups without big quality drops, which is the practical part worth noting first. The approach also claims to handle a range of fairness objectives under one framework with some convergence theory attached. That combination of speed, parallelism, and guarantees is what stands out from the usual trade-off between exact but slow solvers and fast but approximate ones. The real-data evaluation is a clear strength here; too many TE papers stay with synthetic matrices, so seeing results on actual demand patterns adds weight. The idea that the method becomes more parallel as the network grows also lines up with where production systems are headed. The soft spot sits in the convergence claim. The abstract says the iterates reach the provably optimal solution for a wide spectrum of fairness objectives, but standard formulations with alpha-fairness or max-min can couple flows across paths in ways that make clean decomposition tricky. If the paper's method relies on separability or convexity that only holds for certain cases, the fixed point might be optimal only for the relaxed version rather than the original problem, especially on dense traffic matrices. The bounded-time near-optimal part is useful in practice, but it would help to see how iteration counts behave as fairness parameters change. This paper is for researchers and engineers who build or tune TE systems in large WANs. A reader who needs faster runtimes with some optimality backing would get direct value from the runtime numbers and the decomposition sketch. It deserves a serious referee because the problem is concrete, the evaluation uses real traces, and the claims are specific enough to check against the math and experiments.

Referee Report

2 major / 2 minor

Summary. The paper introduces GATE, a GPU-accelerated decomposition algorithm for traffic engineering (TE) in wide-area networks. It claims to enable highly parallelizable computation on GPUs while iteratively converging to the provably optimal solution for a wide spectrum of fairness objectives, with theoretical convergence guarantees and near-optimal performance within bounded time. Evaluation on production traces from two large cloud WANs reports 5-10x speedups over state-of-the-art solvers.

Significance. If the decomposition and convergence properties hold, GATE would meaningfully advance scalable TE by closing the gap between slow exact solvers and fast but suboptimal approximations in large cloud networks. The use of real production traces from two WANs is a positive aspect that grounds the performance claims.

major comments (2)

[Abstract and §3] Abstract and §3 (method): The claim of convergence to the 'provably optimal solution' for a 'wide spectrum of fairness objectives' rests on unstated separability and convexity assumptions in the decomposition. Standard formulations with alpha-fairness (alpha ≠ 1), max-min, or lex-max objectives are nonlinear and couple flows across paths; without explicit conditions showing that the chosen blocks remain convex and the fixed point is the global optimum (rather than a stationary point of a relaxation), the guarantee does not hold for arbitrary objectives or dense traffic matrices.
[Abstract and §5] Abstract and §5 (evaluation): The 5-10x speedup to 'near-optimal' solutions is reported on production traces, but the abstract provides no details on iteration counts, optimality gaps, baseline solvers, or error analysis. This makes it impossible to assess whether the speedup is independent of the fairness parameter or holds only for specific cases where the decomposition converges quickly.

minor comments (2)

[§2] Notation for the fairness objectives and the decomposition blocks should be introduced earlier and used consistently to aid readability.
[Abstract] The abstract mentions 'theoretically guaranteed convergence' but does not reference the specific theorem or proof sketch; adding a forward pointer would help.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment point by point below and will revise the manuscript to incorporate clarifications and additional details where appropriate.

read point-by-point responses

Referee: [Abstract and §3] Abstract and §3 (method): The claim of convergence to the 'provably optimal solution' for a 'wide spectrum of fairness objectives' rests on unstated separability and convexity assumptions in the decomposition. Standard formulations with alpha-fairness (alpha ≠ 1), max-min, or lex-max objectives are nonlinear and couple flows across paths; without explicit conditions showing that the chosen blocks remain convex and the fixed point is the global optimum (rather than a stationary point of a relaxation), the guarantee does not hold for arbitrary objectives or dense traffic matrices.

Authors: We appreciate the referee pointing out the need for explicit assumptions. Section 3 of the manuscript defines the block decomposition and states that convergence holds when the objective is separable across blocks and each subproblem is convex. For the supported fairness objectives (including alpha-fairness for alpha > 0, max-min, and lex-max), the utility functions are concave and the flow conservation constraints are linear, preserving convexity of the blocks. The fixed point of the iterative procedure coincides with the global optimum of the original problem under these conditions. We will revise §3 to explicitly enumerate the separability and convexity requirements, add a short proof sketch confirming the fixed point is the global optimum (not merely a stationary point of a relaxation), and clarify that the guarantee applies to the spectrum of objectives for which these properties hold rather than arbitrary nonlinear objectives. revision: yes
Referee: [Abstract and §5] Abstract and §5 (evaluation): The 5-10x speedup to 'near-optimal' solutions is reported on production traces, but the abstract provides no details on iteration counts, optimality gaps, baseline solvers, or error analysis. This makes it impossible to assess whether the speedup is independent of the fairness parameter or holds only for specific cases where the decomposition converges quickly.

Authors: We agree that additional quantitative details in the abstract and evaluation section would improve clarity. In the revised manuscript we will expand §5 to report observed iteration counts, measured optimality gaps relative to exact solvers, the specific baseline solvers employed, and an analysis of performance variation across fairness parameters. The abstract will be updated with a concise statement summarizing these aspects. This will allow readers to verify that the reported speedups are consistent across the tested range of fairness objectives on the production traces. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation relies on algorithmic decomposition and external evaluation

full rationale

The paper describes GATE as a GPU-parallelizable decomposition of the TE problem with claimed theoretical convergence guarantees to the optimal solution for a range of fairness objectives. No equations or steps in the provided abstract reduce a prediction or optimality claim to a fitted parameter, self-definition, or self-citation chain by construction. The speedup and near-optimality results are presented as outcomes of evaluation on production traces from two cloud WANs, which are independent of the derivation. This is consistent with a self-contained algorithmic contribution whose central claims do not collapse to renaming or fitting inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the assumption that standard convex optimization formulations of TE admit a parallelizable decomposition whose iterative updates converge to optimality; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (2)

domain assumption Traffic engineering can be formulated as an optimization problem that admits a highly parallelizable decomposition suitable for GPU execution.
Core premise enabling the GPU acceleration and iterative convergence.
domain assumption The iterative method converges to the provably optimal solution for a wide spectrum of fairness objectives.
Stated as a theoretical guarantee in the abstract.

pith-pipeline@v0.9.0 · 5502 in / 1251 out tokens · 24680 ms · 2026-05-09T16:48:40.695063+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

41 extracted references · 1 canonical work pages

[1]

https: //atscaleconference.com/events/ networking-scale-2024/, Jan

Networking @scale 2024. https: //atscaleconference.com/events/ networking-scale-2024/, Jan. 2024. Accessed: 2026-2-6

2024
[2]

Abuzaid, P

F. Abuzaid, P. Bailis, and M. Zaharia. Solving large- scale granular resource allocation problems efficiently with pop. In Proceedings of the 28th ACM Symposium on Operating Systems Principles (SOSP ’21), pages 638–652, 2021

2021
[3]

Abuzaid, S

F. Abuzaid, S. Kandula, B. Arzani, I. Menache, M. Za- haria, and P. Bailis. Contracting wide-area network topologies to solve flow problems quickly. In 18th USENIX Symposium on Networked Systems Design and Implementation (NSDI 21), pages 175–192, 2021

2021
[4]

Annamalai

N. Annamalai. Azure networking updates on security, reliability, and high availability. https://azure.microsoft.com/en-us/blog/ azure-networking-updates-on-security-reliability-and-high-availability/ , Dec. 2025. Accessed: 2026-2-6

2025
[5]

D. P. Bertsekas and R. G. Gallager. Data Networks. Prentice Hall, Englewood Cliffs, NJ, 3rd edition, 2009

2009
[6]

S. Boyd, P. Neal, C. Eric, P. Borja, and E. Jonathan. Distributed optimization and statistical learning via the alternating direction method of multipliers.Foundations and Trends® in Machine learning, 3(1):1–122, 2011

2011
[7]

Danna, A

E. Danna, A. Hassidim, H. Kaplan, A. Kumar, Y . Man- sour, D. Raz, and M. Segalov. Upward max-min fairness. Journal of the ACM (JACM), 64(1):1–24, 2017

2017
[8]

Danna, S

E. Danna, S. Mandal, and A. Singh. A practical algo- rithm for balancing the max-min fairness and throughput objectives in traffic engineering. In 2012 Proceedings IEEE INFOCOM, pages 846–854. IEEE, 2012

2012
[9]

Data center it capex 5-year forecast report

Dell’Oro Group. Data center it capex 5-year forecast report. Market research report, Dell’Oro Group, January
[10]

Cloud Capex to Surge as AI Infrastructure Matures

Summarized in Light Reading: "Cloud Capex to Surge as AI Infrastructure Matures"
[11]

Denis, Y

M. Denis, Y . Yao, A. Hatch, Q. Zhang, C. L. Lim, S. Zhang, K. Sugrue, H. Kwok, M. J. Fernandez, P. La- pukhov, et al. EBB: Reliable and evolvable express backbone network in meta. In Proceedings of the ACM SIGCOMM 2023 Conference, pages 346–359, 2023

2023
[12]

Feamster, J

N. Feamster, J. Rexford, and E. Zegura. The road to SDN: an intellectual history of programmable networks. SIGCOMM Comput. Commun. Rev., 44(2):87–98, Apr. 2014

2014
[13]

J. He, M. Bresler, M. Chiang, and J. Rexford. To- wards robust multi-layer traffic engineering: Optimiza- tion of congestion control and routing. IEEE Journal on selected areas in communications, 25(5):868–880, 2007

2007
[14]

M. R. Hestenes. Multiplier and gradient meth- ods. Journal of Optimization Theory and Applications, 4:303–320, 1969

1969
[15]

C.-Y . Hong, S. Kandula, R. Mahajan, M. Zhang, V . Gill, M. Nanduri, and R. Wattenhofer. Achieving high uti- lization with software-driven wan. In Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM, pages 15–26. ACM, 2013

2013
[16]

S. Jain, A. Kumar, S. Mandal, J. Ong, L. Poutievski, A. Singh, S. Venkata, J. Wanderer, J. Zhou, M. Zhu, J. Zolla, U. Hölzle, S. Stuart, and A. Vahdat. B4: Experi- ence with a globally deployed software defined wan. In Proceedings of the ACM SIGCOMM Conference, Hong Kong, China, 2013

2013
[17]

L. Jose, S. Ibanez, M. Alizadeh, and N. McKeown. A distributed algorithm to calculate max-min fair rates without per-flow state. Proceedings of the ACM on Measurement and Analysis of Computing Systems, 3(2):1–25, 2019

2019
[18]

Jurkiewicz

P. Jurkiewicz. Topohub: A repository of reference gabriel graph and real-world topologies for networking research. SoftwareX, 24:101540, 2023. 13

2023
[19]

F. P. Kelly, A. K. Maulloo, and D. K. H. Tan. Rate con- trol for communication networks: shadow prices, propor- tional fairness and stability. Journal of the Operational Research society, 49(3):237–252, 1998

1998
[20]

Kikuta, E

K. Kikuta, E. Oki, N. Yamanaka, N. Togawa, and H. Nakazato. Effective parallel algorithm for gpgpu- accelerated explicit routing optimization. In 2015 IEEE Global Communications Conference (GLOBECOM), pages 1–6, 2015

2015
[21]

B. Koley. Google’s AI-powered next-gen global net- work: Built for the gemini era. Google Cloud Next 25, Apr. 2025

2025
[22]

Krentsel, N

A. Krentsel, N. Saran, B. Koley, S. Mandal, A. Narayanan, S. Ratnasamy, A. Al-Shabibi, A. Shaikh, R. Shakir, A. Singla, and H. Weatherspoon. A decen- tralized sdn architecture for the wan. In Proceedings of the ACM SIGCOMM 2024 Conference, ACM SIGCOMM ’24, page 938–953, New York, NY , USA,

2024
[23]

Association for Computing Machinery
[24]

Krishnaswamy, R

U. Krishnaswamy, R. Singh, P. Mattes, P.-A. C. Bis- sonnette, N. Bjørner, Z. Nasrin, S. Kothari, P. Reddy, J. Abeln, S. Kandula, H. Raj, L. Irun-Briz, J. Gaudette, and E. Lan. OneWAN is better than two: Unifying a split WAN architecture. In 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23), pages 515–529, Boston, MA, Apr. 2023...

2023
[25]

S. Liao. On the homotopy analysis method for nonlin- ear problems. Applied Mathematics and Computation, 147(2):499–513, 2004

2004
[26]

Lin and N

X. Lin and N. B. Shroff. Utility maximization for com- munication networks with multipath routing. IEEE Transactions on Automatic Control, 51(5):766–781, 2006

2006
[27]

S. H. Low and D. E. Lapsley. Optimization flow con- trol. i. basic algorithm and convergence. IEEE/ACM Transactions on networking, 7(6):861–874, 1999

1999
[28]

Y . Lu, S. Kandula, A. C. König, and S. Chaudhuri. Pre- training summarization models of structured datasets for cardinality estimation. Proceedings of the VLDB Endowment, 15(3):414–426, 2021

2021
[29]

Marcus, P

R. Marcus, P. Negi, H. Mao, C. Zhang, M. Al- izadeh, T. Kraska, O. Papaemmanouil, and N. Tat- bul. Neo: A learned query optimizer. arXiv preprint arXiv:1904.03711, 2019

work page arXiv 1904
[30]

Nace and M

D. Nace and M. Pióro. Max-min fairness and its appli- cations to routing and load-balancing in communication networks: a tutorial. IEEE Communications Surveys & Tutorials, 10(4):5–17, 2009

2009
[31]

Namyar, B

P. Namyar, B. Arzani, S. Kandula, S. Segarra, D. Crankshaw, U. Krishnaswamy, R. Govindan, and H. Raj. Solving max-min fair resource allocations quickly on large graphs. In 21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24), pages 1937–1958, 2024

1937
[32]

Orlowski, R

S. Orlowski, R. Wessäly, M. Pióro, and A. Tomaszewski. Sndlib 1.0—survivable network design library. Networks, 55(3):276–286, 2010

2010
[33]

Perry, F

Y . Perry, F. V . Frujeri, C. Hoch, S. Kandula, I. Men- ache, M. Schapira, and A. Tamar. {DOTE}: Rethink- ing (predictive) {WAN} traffic engineering. In 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23), pages 1557–1581, 2023

2023
[34]

Roughan, A

M. Roughan, A. Greenberg, C. Kalmanek, M. Rum- sewicz, J. Yates, and Y . Zhang. Experience in measuring backbone traffic variability: models, metrics, measure- ments and meaning. In Proceedings of the 2nd ACM SIGCOMM Workshop on Internet Measurment, IMW ’02, page 91–92, New York, NY , USA, 2002. Associa- tion for Computing Machinery

2002
[35]

R. Srikant. The mathematics of Internet congestion control. Springer, 2004

2004
[36]

Srikant and L

R. Srikant and L. Ying. Communication networks: An optimization, control and stochastic networks perspective. Cambridge University Press, 2014

2014
[37]

Stoer and R

J. Stoer and R. Bulirsch. Introduction to Numerical Analysis, volume 12 of Texts in Applied Mathematics. Springer-Verlag, New York, 2002

2002
[38]

X. Wang, Q. Zhang, J. Ren, S. Xu, S. Wang, and S. Yu. Toward efficient parallel routing optimization for large- scale sdn networks using gpgpu. Journal of Network and Computer Applications, 113:1–13, 2018

2018
[39]

Z. Xu, F. Y . Yan, R. Singh, J. T. Chiu, A. M. Rush, and M. Yu. Teal: Learning-accelerated optimization of wan traffic engineering. In Proceedings of the ACM SIGCOMM 2023 Conference, pages 313–328, 2023

2023
[40]

Z. Xu, M. Yu, and F. Y . Yan. Decouple and decompose: Scaling resource allocation with dede. In Proceedings of the 19th USENIX Symposium on Operating Systems Design and Implementation (OSDI 25). USENIX Asso- ciation, July 2025

2025
[41]

dual k demst β + Dst − ∑ r∈Pst xk r !# + (19) slackk+1 cape =

F. Zhang and S. Boyd. Solving large multicommod- ity network flow problems on GPUs. Automation and Remote Control, 86(8):782–795, 2025. 14 Max-flow Danna GATE k-Waterfill SWAN Soroush0.80 0.85 0.90 0.95 1.00 1.05Demand served w.rt. Danna Max-flow Danna GATE k-Waterfill SWAN Soroush Figure 13:GATE runtime with increasing GPU threads A Evaluation Continued ...

2025