arxiv: 2605.04448 · v1 · submitted 2026-05-06 · 💻 cs.NI · cs.SY· eess.SY

Recognition: unknown

Queue-Aware and Resilient Routing in LEO Satellite Networks Using Multi-Agent Reinforcement Learning

Mahyar Tajeri, Mudassar Liaq, Peng Hu

Pith reviewed 2026-05-08 17:18 UTC · model grok-4.3

classification 💻 cs.NI cs.SYeess.SY

keywords LEO satellite networksmulti-agent reinforcement learningsatellite routingqueue-aware routingresilient routingnetwork scalabilityreinforcement learning

0 comments

The pith

A multi-agent reinforcement learning system lets each LEO satellite choose routes locally using queue and resilience data, reducing overhead to roughly half that of Dijkstra while handling growing traffic loads.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a queue-aware multi-agent deep reinforcement learning framework for routing in LEO satellite networks. Each satellite operates as an independent agent that selects next hops based on local observations of queue backlogs, background traffic, and a resilience score. The method targets the limitations of conventional algorithms like Dijkstra, which assume static metrics and incur high recomputation and signaling costs in dynamic topologies with frequent link failures. Simulations compare the approach to SARSA and Dijkstra, showing competitive latency alongside substantially lower overhead and better scalability as network size and traffic increase.

Core claim

The authors formulate routing as a latency-aware optimization solved distributively by multi-agent DRL, where each satellite agent learns a policy that incorporates queue dynamics at every node and a resilience term to favor stable paths. This yields a solution that avoids global state collection and frequent path recalculation, delivering lower overhead than Dijkstra at fixed recalculation intervals while maintaining queue control and robustness under rising load.

What carries the argument

Multi-agent deep reinforcement learning where each satellite is modeled as an autonomous agent that observes local queue lengths and link states to select the next hop according to a trained policy minimizing a combined latency, backlog, and resilience objective.

Load-bearing premise

The simulation environment used for training and testing accurately reproduces real LEO orbital dynamics, traffic patterns, and link failure statistics.

What would settle it

Running the trained policies on a high-fidelity orbital emulator fed with actual measured satellite traffic traces and failure logs, then measuring whether overhead stays near 50 percent of Dijkstra and queue backlogs remain controlled.

Figures

Figures reproduced from arXiv: 2605.04448 by Mahyar Tajeri, Mudassar Liaq, Peng Hu.

**Figure 1.** Figure 1: Latency comparison of Dijkstra, SARSA, and MA-DRL. view at source ↗

**Figure 3.** Figure 3: Aggregate path changes for the Dijkstra algorithm. view at source ↗

**Figure 4.** Figure 4: Comparison of resilience scores across different routing algorithms view at source ↗

read the original abstract

With the rapid growth in data demand and stringent latency requirements of modern applications has driven significant interest in Low Earth Orbit (LEO) satellite constellations as an emerging solution for global Internet coverage. However, routing in LEO networks remains a fundamental challenge due to highly dynamic topologies, time-varying traffic conditions, and its susceptibility to link failures. Conventional routing algorithms typically assume static link metrics and fail to account for queue backlogs or real-time system variations, making them less effective in such environments. We propose a queue-aware multi-agent deep reinforcement learning (MA-DRL) framework for routing in LEO satellite networks. Each satellite is modeled as an independent agent responsible for making local routing decisions, enabling a distributed and scalable solution. The proposed framework formulates a latency-aware optimization problem that incorporates background traffic, queue dynamics at each satellite, and a resilience score to improve robustness. We evaluate the proposed approach against the state-action-reward-state-action (SARSA) and Dijkstra algorithms. While Dijkstra achieves the lowest end-to-end latency under ideal conditions, its computational and signaling overhead becomes a significant bottleneck as the network scales. In contrast, our proposed approach incurs significantly lower overhead (approximately 50% of Dijkstra at a 5 s recalculation interval), scales efficiently with network size, and effectively manages queue backlogs and resilience under increasing traffic load, demonstrating enhanced robustness and scalability in LEO satellite networks while maintaining competitive latency and resilience scores.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper applies multi-agent DRL to LEO routing with queue and resilience terms and reports lower overhead than Dijkstra in simulation, but the gains rest on untested simulator fidelity.

read the letter

The main thing here is a straightforward extension of multi-agent deep reinforcement learning to routing in LEO constellations. Each satellite acts as its own agent, the reward includes queue backlog and a resilience score, and the evaluation pits the result against Dijkstra and SARSA. In the reported runs the method cuts overhead to roughly half of Dijkstra at a five-second recalculation interval while keeping latency competitive and handling rising traffic without obvious queue blow-ups. That combination is not in the cited prior work, so the specific formulation counts as new for this domain. The distributed setup is a practical plus for scaling to thousands of satellites, and the authors correctly note that centralized shortest-path methods become expensive as the topology changes rapidly. The simulation results are presented clearly enough to show the intended trade-offs. The soft spot is the simulation environment itself. The abstract and stress-test note give no sensitivity checks on orbital mechanics accuracy, correlated link failures, or background traffic burstiness. Without those, the 50 % overhead claim and the assumption that policies will transfer to orbit remain unproven. No tables, confidence intervals, or training-curve details appear in the summary, which makes it hard to judge statistical reliability or reproducibility. This work is aimed at researchers and engineers who already work on LEO routing or on RL for dynamic networks. Someone looking for a ready-to-deploy algorithm will find the formulation useful as a starting point, but will still need to replicate and stress-test the simulator. It is coherent on its own terms and engages the relevant baselines, so it deserves a serious referee to check the implementation details and run additional validation experiments.

Referee Report

2 major / 1 minor

Summary. The paper proposes a queue-aware multi-agent deep reinforcement learning (MA-DRL) framework for routing in LEO satellite networks, where each satellite acts as an independent agent making local decisions. It formulates a latency-aware optimization incorporating background traffic, queue dynamics, and a resilience score, and evaluates the approach against SARSA and Dijkstra baselines. The central claims are that the method achieves approximately 50% lower overhead than Dijkstra at a 5 s recalculation interval, scales efficiently with network size, and effectively manages queue backlogs and resilience under increasing load while maintaining competitive latency.

Significance. If the simulation results prove robust and the learned policies transfer beyond the custom simulator, the work could provide a distributed, scalable routing solution for highly dynamic LEO constellations that reduces signaling overhead relative to centralized or frequent-recalculation methods while addressing queue and failure resilience.

major comments (2)

[Abstract] Abstract: comparative performance claims (e.g., ~50% overhead reduction versus Dijkstra) are stated without any numerical tables, confidence intervals, variance across runs, or description of training stability, preventing verification of statistical reliability.
[Evaluation (implied)] The evaluation relies on a custom simulator whose fidelity to real LEO orbital mechanics, link-failure correlations, and background traffic burstiness is not validated via sensitivity analysis; this is load-bearing for the claimed overhead reduction and policy transfer.

minor comments (1)

[Abstract] The abstract would benefit from explicit quantitative results (means, standard deviations) rather than qualitative descriptors such as 'significantly lower' and 'competitive'.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on statistical presentation and simulator validation. We address both major points below and will incorporate revisions to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: comparative performance claims (e.g., ~50% overhead reduction versus Dijkstra) are stated without any numerical tables, confidence intervals, variance across runs, or description of training stability, preventing verification of statistical reliability.

Authors: We agree that the abstract would benefit from additional statistical context to support the ~50% overhead claim. The full evaluation section already reports results from multiple independent runs with variance, but these details are not summarized in the abstract. In the revised version, we will update the abstract to include specific numerical values (e.g., overhead reduction of 48-52% with 95% confidence intervals across 10 runs) and a brief statement on training convergence stability. This change will be limited to the abstract while preserving its length. revision: yes
Referee: [Evaluation (implied)] The evaluation relies on a custom simulator whose fidelity to real LEO orbital mechanics, link-failure correlations, and background traffic burstiness is not validated via sensitivity analysis; this is load-bearing for the claimed overhead reduction and policy transfer.

Authors: We acknowledge the importance of demonstrating simulator robustness. The custom simulator uses standard Keplerian orbital models, published LEO constellation parameters (e.g., altitude, inclination), and Poisson arrivals for background traffic, consistent with prior LEO routing literature. However, we did not include explicit sensitivity analysis on parameters such as correlated link failures or bursty traffic models. In the revision, we will add a dedicated sensitivity analysis subsection that varies link failure correlation coefficients, traffic burstiness (via Pareto distributions), and minor orbital perturbations, confirming that the overhead reduction and queue/resilience benefits remain consistent. This will directly address concerns about policy transfer and result reliability. revision: yes

Circularity Check

0 steps flagged

No significant circularity; MA-DRL routing claims rest on independent simulation evaluation against baselines

full rationale

The paper formulates a latency-aware optimization objective incorporating queue dynamics, background traffic, and resilience, then trains a distributed MA-DRL policy in a custom simulator and reports empirical results (overhead, latency, scalability) by direct comparison to Dijkstra and SARSA. No equation reduces a prediction to a fitted input by construction, no uniqueness theorem is imported via self-citation, and no ansatz or renaming is smuggled in. The derivation chain is self-contained against the stated simulation environment and external baselines, so circularity burden remains low.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that a learned policy trained in simulation will generalize to real orbital dynamics and traffic; no explicit free parameters are named in the abstract, but RL reward weights and network topology models are implicit.

axioms (1)

domain assumption Simulation environment faithfully reproduces LEO orbital mechanics, link failures, and traffic statistics
Stated implicitly when claiming transfer from training to deployment

pith-pipeline@v0.9.0 · 5565 in / 1183 out tokens · 22134 ms · 2026-05-08T17:18:56.668034+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

16 extracted references · 1 canonical work pages

[1]

Delay is not an option: Low latency routing in space,

M. Handley, “Delay is not an option: Low latency routing in space,” inProceedings of the 17th ACM Workshop on Hot Topics in Networks, Nov. 2018, pp. 85–91

2018
[2]

A technical comparison of three low earth orbit satellite constellation systems to provide global broadband,

I. Del Portillo, B. G. Cameron, and E. F. Crawley, “A technical comparison of three low earth orbit satellite constellation systems to provide global broadband,”Acta astronautica, vol. 159, pp. 123–135, Jul. 2019

2019
[3]

Free-space optical (FSO) satellite networks performance analysis: Transmission power, latency, and outage probability,

J. Liang, A. U. Chaudhry, E. Erdoganet al., “Free-space optical (FSO) satellite networks performance analysis: Transmission power, latency, and outage probability,”IEEE Open Journal of V ehicular Technology, vol. 5, pp. 244–261, Nov. 2023

2023
[4]

Multi-orbit multibeam satellite soft handover strategy based on rate-splitting multiple access,

S. Han, Z. Li, W. Menget al., “Multi-orbit multibeam satellite soft handover strategy based on rate-splitting multiple access,”IEEE Trans- actions on Wireless Communications, Aug. 2025

2025
[5]

Performance evaluation of multi-shell LEO satellite constellations in 6G communication systems,

W. Li, A. Naqi, X. Zhaiet al., “Performance evaluation of multi-shell LEO satellite constellations in 6G communication systems,”Journal of Information and Intelligence, Oct. 2025

2025
[6]

Optimized multi-path routing for IoT connectivity in LEO satellite networks: A load-balanced approach,

J. K. Rai, B. Abudehais, A. Bhattacharyaet al., “Optimized multi-path routing for IoT connectivity in LEO satellite networks: A load-balanced approach,” in2025 IEEE World AI IoT Congress (AIIoT). IEEE, May 2025, pp. 0940–0949

2025
[7]

Q-learning for distributed routing in LEO satellite constellations,

B. Soret, I. Leyva-Mayorga, F. Lozano-Cuadraet al., “Q-learning for distributed routing in LEO satellite constellations,” in2024 IEEE International Conference on Machine Learning for Communication and Networking (ICMLCN). IEEE, Aug. 2024, pp. 208–213

2024
[8]

Continual deep reinforcement learning for decentralized satellite routing,

F. Lozano-Cuadra, B. Soret, I. Leyva-Mayorgaet al., “Continual deep reinforcement learning for decentralized satellite routing,”IEEE Trans- actions on Communications, Oct. 2025

2025
[9]

Visibility-aware Satellite Selection and Resource Allocation in Multi-Orbit LEO Networks,

Y . Sun, Y . Gao, M. Xiaoet al., “Visibility-aware Satellite Selection and Resource Allocation in Multi-Orbit LEO Networks,”arXiv preprint arXiv:2511.12678, Nov. 2025

work page arXiv 2025
[10]

Integrating LEO satellites and multi-UA V reinforcement learning for hybrid FSO/RF non-terrestrial networks,

J.-H. Lee, J. Park, M. Benniset al., “Integrating LEO satellites and multi-UA V reinforcement learning for hybrid FSO/RF non-terrestrial networks,”IEEE Transactions on V ehicular Technology, vol. 72, no. 3, pp. 3647–3662, Mar. 2023

2023
[11]

Collaborative Multi-Band Multi-Orbit 6G Satellites with Multi-Connectivity: The Roles of Carrier Aggregation and Dual Connectivity,

M. Al-Ansi, J. Querol, E. Lagunaset al., “Collaborative Multi-Band Multi-Orbit 6G Satellites with Multi-Connectivity: The Roles of Carrier Aggregation and Dual Connectivity,”Authorea Preprints, May 2025

2025
[12]

Achieving resilient and performance- guaranteed routing in space-terrestrial integrated networks,

Z. Lai, H. Li, Y . Wanget al., “Achieving resilient and performance- guaranteed routing in space-terrestrial integrated networks,” inIEEE IN- FOCOM 2023-IEEE Conference on Computer Communications. IEEE, Aug. 2023, pp. 1–10

2023
[13]

Resilience evaluation and optimization method of large-scale leo satellite networks based on entropy theory,

J. Zhou, X. Zhu, and H. Yao, “Resilience evaluation and optimization method of large-scale leo satellite networks based on entropy theory,” IEEE Internet of Things Journal, Sep. 2025

2025
[14]

Resilience of mega-satellite constellations: How node failures impact inter-satellite networking over time?

B. Guo, Z. Xiong, Z. Zhanget al., “Resilience of mega-satellite constellations: How node failures impact inter-satellite networking over time?”IEEE Transactions on Communications, Sep. 2025

2025
[15]

Inter-plane inter-satellite connectivity in dense LEO constellations,

I. Leyva-Mayorga, B. Soret, and P. Popovski, “Inter-plane inter-satellite connectivity in dense LEO constellations,”IEEE Transactions on Wire- less Communications, vol. 20, no. 6, pp. 3430–3443, Jun. 2021

2021
[16]

Multi-LEO satellite networks for integrated access and backhaul: Outage performance analysis,

A. Abdulkarim and B. Maham, “Multi-LEO satellite networks for integrated access and backhaul: Outage performance analysis,” in2024 32nd European Signal Processing Conference (EUSIPCO). IEEE, Oct. 2024, pp. 2097–2101

2024