pith. sign in

arxiv: 2606.00835 · v1 · pith:X6MIO3IAnew · submitted 2026-05-30 · 💻 cs.LG

Online Packet Scheduling with Deadlines and Learning

Pith reviewed 2026-06-28 19:06 UTC · model grok-4.3

classification 💻 cs.LG
keywords online packet schedulingdeadlinesbandit feedbackalpha-regretsleeping banditscompetitive ratiostochastic weightsquality of service
0
0 comments X

The pith

In the online packet scheduling with deadlines problem under bandit feedback, algorithms achieve α-regret of order sqrt(KT) matching the standard lower bound and competitive ratios below the golden ratio for finite packet types.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper models router decisions on which expiring packet to transmit when values are revealed only after processing as the Online Packet Scheduling with Deadlines problem with partial feedback. It assumes unknown weights are drawn stochastically and reduces the problem to sleeping bandits to minimize α-regret. The authors supply algorithms attaining an α-regret upper bound of order sqrt(KT) that holds across different slackness conditions for both randomized and deterministic cases. In the special case of deadlines at most one step ahead, the deterministic algorithm reaches the tightest possible competitive ratio, and when the number of packet types K is finite and at least two the ratio improves to a value θ_K strictly below the golden ratio.

Core claim

Under a stochastic assumption on packet weights and bandit feedback, the OPSD problem admits algorithms whose α-regret is bounded by Õ(sqrt(KT)), matching the lower bound known for ordinary bandits. For 2-bounded deadline instances the deterministic algorithm attains the provably optimal competitive ratio. When K is finite and at least 2 it is possible to obtain a competitive ratio θ_K lying in the interval [√2, Φ) and thereby strictly beat the classical golden-ratio barrier Φ = (1 + √5)/2.

What carries the argument

The reduction of OPSD to the sleeping bandits problem that enables α-regret minimization.

If this is right

  • The same α-regret bound of Õ(sqrt(KT)) holds for randomized and deterministic algorithms under varying degrees of slackness.
  • In every 2-bounded deadline instance the deterministic algorithm attains the tightest achievable competitive ratio.
  • When the number of distinct packet types K is finite and at least 2, the attainable competitive ratio θ_K lies strictly below Φ.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same reduction technique could be applied to other online selection problems that combine hard deadlines with partial feedback.
  • If real network traffic satisfies the stochastic weight assumption, the derived competitive ratios would translate directly into improved quality-of-service guarantees.
  • Numerical experiments that vary K while keeping deadlines 2-bounded would quantify how quickly θ_K approaches √2.

Load-bearing premise

Packet weights are generated by a fixed but unknown stochastic distribution.

What would settle it

An explicit family of packet arrival sequences and weight realizations in which every algorithm suffers α-regret larger than order sqrt(KT) would refute the claimed upper bound.

Figures

Figures reproduced from arXiv: 2606.00835 by Achraf Azize, Gianmarco Genalti, Vianney Perchet.

Figure 1
Figure 1. Figure 1: Exemplification of the lower bounds instances. Each gadget (of length two) is repeated T times. What happens in the second round of a gadget depends on the action of ALG in the first. To complete the proof, we sum over the epochs: E[GOPT] − θKE[GALGθ,U ] = ∑ E E[G E OPT] − θKE[G E ALGθ,U ] ≤ 2θK ∑ E ∑ i∈KE βi,tE ≤ 2θK ∑ i∈[K] Ni,T ∑ j=1 βi,ti,j ≤ 2cθK ∑ i∈[K] Ni,T ∑ j=1 ¿ÁÁÀ ln KT δ j ≤ Õ( √ KT) , where t… view at source ↗
read the original abstract

Network routers that enforce Quality-of-Service (QoS) guarantees must decide, at every clock cycle, which expiring packet of information to transmit, even when the value of the packet is unknown until it is processed. We frame this problem as the Online Packet Scheduling with Deadlines (OPSD) problem under Partial Feedback: packets arrive at every clock cycle, with different deadlines, but the weights are only observed after execution. Under a stochastic assumption on the unknown weights, we explore different variants of the OPSD problem with bandit feedback. We establish a connection between our setting and the sleeping bandits problem, and set our learning goal to $\alpha$-regret minimization. We provide algorithms with provable $\alpha$-regret guarantees under different spans of slackness, distinguishing systems allowing for randomization and systems that do not. In every scenario, our algorithms achieve an $\alpha$-regret upper bound of $\widetilde{\mathcal{O}}\left(\sqrt{KT}\right)$, matching the lower bound for the standard bandit setting. In the practically relevant case of $2$-bounded deadline instances, where the deadline is set at most one clock cycle away from the arrival, our deterministic algorithm achieves the provably tightest possible competitive ratio. Remarkably, when the number of distinct packet types $K\ge 2$ is finite, it is possible to break the well-established $\Phi = \frac{1+\sqrt{5}}{2}$ competitive ratio barrier and attain a tighter competitive ratio $\theta_K$ ranging in $[\sqrt{2}, \Phi)$.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The manuscript frames the Online Packet Scheduling with Deadlines (OPSD) problem under partial (bandit) feedback assuming stochastic unknown weights. It reduces the setting to sleeping bandits, targets α-regret minimization, and presents algorithms (randomized and deterministic) achieving an α-regret upper bound of Õ(√(KT)) that matches the standard bandit lower bound. For the practically relevant 2-bounded deadline case, the deterministic algorithm attains a competitive ratio θ_K ∈ [√2, Φ) when the number of packet types K ≥ 2 is finite, breaking the established Φ barrier.

Significance. If the claimed reductions and bounds hold, the work supplies the first learning-theoretic treatment of OPSD with bandit feedback, delivering matching regret upper and lower bounds together with a concrete improvement to the competitive ratio in the finite-K, 2-bounded regime. The explicit connection to sleeping bandits and the finite-K tightening of the competitive ratio are the primary contributions.

major comments (1)
  1. [Theoretical results and proofs] The central α-regret claim of Õ(√(KT)) and its matching lower bound rest on the reduction to sleeping bandits; the provided text does not contain the explicit derivation steps or full proofs that would allow verification of how the deadline constraints and partial feedback are handled in the reduction.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their thoughtful review and for highlighting the need for greater clarity on our theoretical contributions. We address the major comment below.

read point-by-point responses
  1. Referee: [Theoretical results and proofs] The central α-regret claim of Õ(√(KT)) and its matching lower bound rest on the reduction to sleeping bandits; the provided text does not contain the explicit derivation steps or full proofs that would allow verification of how the deadline constraints and partial feedback are handled in the reduction.

    Authors: We agree that the manuscript would benefit from more explicit derivation steps. In the revised version we will add a dedicated subsection detailing the reduction from OPSD (with deadlines and bandit feedback) to sleeping bandits. This will include: (i) the precise mapping of packet types, arrival times, and deadlines to arms and sleeping schedules; (ii) how partial feedback is translated into the sleeping-bandit observation model; and (iii) the full proofs of both the Õ(√(KT)) α-regret upper bound (for both randomized and deterministic algorithms) and the matching lower bound. These additions will make the handling of deadline constraints and partial feedback fully verifiable. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivations rest on independent reductions and standard bounds

full rationale

The paper frames OPSD as a reduction to sleeping bandits, then derives α-regret bounds of Õ(√(KT)) via new algorithmic constructions that match the known bandit lower bound. The competitive-ratio results for 2-bounded instances (including the θ_K improvement for finite K) are obtained from explicit deterministic algorithms and case analysis, not from parameters fitted to the same data or from self-referential definitions. No step equates a claimed prediction to its own input by construction, and external grounding via standard bandit lower bounds prevents load-bearing self-citation circularity. The central claims therefore remain self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on a stochastic assumption for packet weights and standard regret analysis techniques from bandits; no free parameters or invented entities are introduced.

axioms (1)
  • domain assumption Weights of packets are drawn from an unknown but fixed stochastic distribution.
    Invoked in the abstract to enable bandit feedback analysis and α-regret guarantees.

pith-pipeline@v0.9.1-grok · 5810 in / 1207 out tokens · 18590 ms · 2026-06-28T19:06:38.830896+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

21 extracted references · 2 canonical work pages

  1. [1]

    Theoretical Computer Science , volume=

    Randomized competitive algorithms for online buffer management in the adaptive adversary model , author=. Theoretical Computer Science , volume=. 2011 , publisher=

  2. [2]

    Journal of Discrete Algorithms , volume=

    Online competitive algorithms for maximizing weighted throughput of unit jobs , author=. Journal of Discrete Algorithms , volume=. 2006 , publisher=

  3. [3]

    Algorithmica (New York) , year=

    Online scheduling with partial job values: Does timesharing or randomization help? , author=. Algorithmica (New York) , year=

  4. [4]

    Proceedings of the 2001 Conference on Information Sciences and Systems , year=

    On the competitiveness of on-line scheduling of unit-length packets with hard deadlines in slotted time , author=. Proceedings of the 2001 Conference on Information Sciences and Systems , year=

  5. [5]

    Proceedings of the thirty-third annual ACM symposium on Theory of computing , pages=

    Buffer overflow management in QoS switches , author=. Proceedings of the thirty-third annual ACM symposium on Theory of computing , pages=

  6. [6]

    Journal of Algorithms , volume=

    Analysis of queueing policies in QoS switches , author=. Journal of Algorithms , volume=. 2004 , publisher=

  7. [7]

    SIAM Journal on Computing , volume=

    A-competitive algorithm for scheduling packets with deadlines , author=. SIAM Journal on Computing , volume=. 2022 , publisher=

  8. [8]

    2005 , publisher=

    Online computation and competitive analysis , author=. 2005 , publisher=

  9. [9]

    Machine learning , volume=

    Regret bounds for sleeping experts and bandits , author=. Machine learning , volume=. 2010 , publisher=

  10. [10]

    2020 , publisher=

    Bandit algorithms , author=. 2020 , publisher=

  11. [11]

    Journal of Network and Computer Applications , volume=

    Quality of service (QoS) in software defined networking (SDN): A survey , author=. Journal of Network and Computer Applications , volume=. 2017 , publisher=

  12. [12]

    IEEE International Conference on Communications, 2003

    QoS control for sensor networks , author=. IEEE International Conference on Communications, 2003. ICC'03. , volume=. 2003 , organization=

  13. [13]

    2013 , publisher=

    End-to-End QoS network design: Quality of Service for rich-media & cloud networks , author=. 2013 , publisher=

  14. [14]

    Telecommunication Systems , volume=

    A survey on QoS routing protocols in Vehicular Ad Hoc Network (VANET) , author=. Telecommunication Systems , volume=. 2021 , publisher=

  15. [15]

    arXiv preprint arXiv:2305.07164 , year=

    Learning-augmented online packet scheduling with deadlines , author=. arXiv preprint arXiv:2305.07164 , year=

  16. [16]

    Proceedings of the 23rd ACM Conference on Economics and Computation , pages=

    Learning-augmented mechanism design: Leveraging predictions for facility location , author=. Proceedings of the 23rd ACM Conference on Economics and Computation , pages=

  17. [17]

    Proceedings of the 2022 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA) , pages=

    Learning-augmented weighted paging , author=. Proceedings of the 2022 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA) , pages=. 2022 , organization=

  18. [18]

    International Conference on Machine Learning , pages=

    On preemption and learning in stochastic scheduling , author=. International Conference on Machine Learning , pages=. 2023 , organization=

  19. [19]

    arXiv preprint arXiv:2410.21266 , year=

    Online weighted paging with unknown weights , author=. arXiv preprint arXiv:2410.21266 , year=

  20. [20]

    Machine learning , volume=

    Finite-time analysis of the multiarmed bandit problem , author=. Machine learning , volume=. 2002 , publisher=

  21. [21]

    Advances in Neural Information Processing Systems , volume=

    Refined lower bounds for adversarial bandits , author=. Advances in Neural Information Processing Systems , volume=