pith. machine review for the scientific record. sign in

arxiv: 2604.07240 · v1 · submitted 2026-04-08 · 💻 cs.MS · cs.AI· cs.LG

Recognition: no theorem link

k-server-bench: Automating Potential Discovery for the k-Server Conjecture

Emmanuel Abb\'e, Etienne Bamas, Kirill Brilliantov

Authors on Pith no claims yet

Pith reviewed 2026-05-10 17:45 UTC · model grok-4.3

classification 💻 cs.MS cs.AIcs.LG
keywords k-server conjecturepotential functionsautomated discoverybenchmarkcompetitive analysislinear inequalitiesonline algorithms
0
0 comments X

The pith

This paper introduces a benchmark that casts the k-server conjecture as a search for a potential function satisfying a graph-structured system of linear inequalities.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The authors formulate the k-server conjecture's potential requirement as a large collection of simple linear inequalities arranged in a graph structure. Any function that meets every inequality supplies strong evidence toward the conjecture's special case even though it does not by itself prove the claim. Experiments show that current agentic code methods fully solve the resolved k=3 instances and cut the number of violations below those of existing potentials in the open k=4 circle case, though they do not yet eliminate all violations. The resulting task also functions as a benchmark for automated mathematical discovery because it resists early saturation and distinguishes random baselines from stronger methods more clearly than prior open-ended coding challenges. The developed tooling further lets researchers test fresh hypotheses on the k-server problem directly.

Core claim

The paper's central contribution is the definition of k-server-bench, a code-based challenge in which an automated agent must produce a potential function that satisfies every inequality in a fixed, graph-structured linear system derived from the k-server conjecture. The evaluation is sound but incomplete: any violated inequality immediately refutes the candidate, while satisfying the entire system does not constitute a proof yet still yields strong evidence and, to date, no known potential passes the open k=4 circle formulation. Experiments confirm that agentic methods solve nontrivial k=3 instances completely and reduce violations on k=4 relative to prior potentials without resolving the k

What carries the argument

The graph-structured system of linear inequalities that any candidate potential function must satisfy for the k-server conjecture.

If this is right

  • A candidate that passes the full k=4 system would constitute strong evidence for the conjecture and could be paired with a proof to yield a substantial theoretical result.
  • The benchmark supplies concrete tooling that lets k-server researchers test new potential hypotheses systematically.
  • Current methods already solve the entire k=3 regime and measurably improve on known potentials for k=4.
  • The task separates naive random baselines from sophisticated agents more sharply than earlier open-ended discovery benchmarks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same inequality-system approach could be adapted to other open problems in online algorithms whose proofs rely on potential functions.
  • Integrating the benchmark with symbolic or theorem-proving tools might convert a passing candidate into a complete proof.
  • The formulation may surface previously unnoticed structural constraints on possible potentials for the k-server problem.

Load-bearing premise

The chosen graph-structured linear inequalities faithfully encode the necessary conditions that every valid potential function for the k-server conjecture must obey.

What would settle it

An automated agent or exhaustive search that produces a potential satisfying every inequality in the k=4 circle formulation would confirm the benchmark can surface useful candidates; conversely, a proof that no such function exists under the given inequalities would show the formulation is too strict.

Figures

Figures reproduced from arXiv: 2604.07240 by Emmanuel Abb\'e, Etienne Bamas, Kirill Brilliantov.

Figure 1
Figure 1. Figure 1: 10 independent runs of ShinkaEvolve, Best-of-N, LoongFlow per task design: (i) no search, just suggest a canonical potential, (ii) with search over canonical potentials, (iii) with search without canonical potential hint. All methods had the same starting conditions per task design and had 100 evaluations cap [PITH_FULL_IMAGE:figures/full_fig_p026_1.png] view at source ↗
read the original abstract

We introduce a code-based challenge for automated, open-ended mathematical discovery based on the $k$-server conjecture, a central open problem in competitive analysis. The task is to discover a potential function satisfying a large graph-structured system of simple linear inequalities. The resulting evaluation procedure is sound but incomplete: any violated inequality definitively refutes a candidate, whereas satisfying all inequalities does not by itself constitute a proof of the corresponding conjecture's special case. Nevertheless, a candidate that passes all constraints would be strong evidence toward a valid proof and, to the best of our knowledge, no currently known potential achieves this under our formulation in the open $k=4$ circle case. As such, a successful candidate would already be an interesting contribution to the $k$-server conjecture, and could become a substantial theoretical result when paired with a full proof. Experiments on the resolved $k=3$ regime show that current agentic methods can solve nontrivial instances, and in the open $k=4$ regime they reduce the number of violations relative to existing potentials without fully resolving the task. Taken together, these results suggest that the task is challenging but plausibly within reach of current methods. Beyond its relevance to the $k$-server community, where the developed tooling enables researchers to test new hypotheses and potentially improve on the current record, the task also serves as a useful \emph{benchmark} for developing code-based discovery agents. In particular, our $k=3$ results show that it mitigates important limitations of existing open-ended code-based benchmarks, including early saturation and the weak separation between naive random baselines and more sophisticated methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The paper introduces k-server-bench, a code-based benchmark for automated discovery of potential functions satisfying a graph-structured system of linear inequalities derived from the k-server conjecture. The evaluation procedure is sound (any violation refutes a candidate) but incomplete (satisfaction provides evidence but not a proof). Experiments demonstrate that agentic methods recover known solutions on resolved k=3 instances and reduce violations relative to existing potentials on the open k=4 circle case, where no known potential satisfies the formulation. The work positions the benchmark as useful both for advancing the k-server conjecture and for evaluating open-ended code-based discovery agents, addressing limitations such as early saturation in prior benchmarks.

Significance. If the results hold, the benchmark supplies a concrete, falsifiable tool that enables systematic testing of candidate potentials for the k-server problem, with a passing candidate constituting strong evidence toward a proof in the open k=4 case. The k=3 experiments confirm recovery of known solutions and demonstrate clear separation between naive and sophisticated methods on nontrivial instances, mitigating early saturation issues in existing discovery benchmarks. The absence of free parameters or self-referential fitting in the inequality system strengthens the claim that the benchmark faithfully captures necessary conditions from the literature.

minor comments (2)
  1. [§3.2] §3.2: the precise construction of the graph for the k=4 circle case (number of nodes, edge weights) could be stated explicitly with a small example to aid reproducibility without requiring inspection of the released code.
  2. [Table 2] Table 2: the column reporting number of violations for agent runs on k=4 would benefit from an additional row or note indicating the best-known hand-crafted potential for direct comparison.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive assessment of the work and for recommending acceptance. No major comments were raised, so we have no specific points requiring response or revision.

Circularity Check

0 steps flagged

No significant circularity: benchmark defined from external literature

full rationale

The paper defines a sound-but-incomplete benchmark consisting of graph-structured linear inequalities drawn from the established k-server conjecture literature. No parameter is fitted to data and then renamed as a prediction; the k=3 results rest on the independently resolved case rather than any self-derived equation; and the central claim is limited to the benchmark's utility for agentic discovery without asserting a proof of the open k=4 case. No load-bearing step reduces by construction to the paper's own inputs or self-citations.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper introduces a new benchmark task built on the existing k-server conjecture without introducing fitted parameters, new mathematical entities, or additional axioms beyond the standard background of competitive analysis.

pith-pipeline@v0.9.0 · 5608 in / 1247 out tokens · 55690 ms · 2026-05-10T17:45:18.601211+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

46 extracted references · 30 canonical work pages · 3 internal anchors

  1. [1]

    Mohammed Abouzaid et al.First Proof. 2026. arXiv:2602.05192 [cs.AI].url:https: //arxiv.org/abs/2602.05192

  2. [2]

    On the competitive ratio of the work function algorithm for the k-server problem

    Yair Bartal and Elias Koutsoupias. “On the competitive ratio of the work function algorithm for the k-server problem”. In:Theoretical computer science324.2-3 (2004), pp. 337–345

  3. [3]

    The 3-server problem in the plane

    Wolfgang W Bein, Marek Chrobak, and Lawrence L Larmore. “The 3-server problem in the plane”. In:Theoretical Computer Science289.1 (2002), pp. 335–354

  4. [4]

    Brenner, Vincent Cohen-Addad, and David Woodruff.Solving an Open Problem in Theoretical Physics using AI-Assisted Discovery

    Michael P. Brenner, Vincent Cohen-Addad, and David Woodruff.Solving an Open Problem in Theoretical Physics using AI-Assisted Discovery. 2026. arXiv:2603.04735 [cs.AI].url:https://arxiv.org/abs/2603.04735

  5. [5]

    Kirill Brilliantov et al.Applying language models to algebraic topology: generating sim- plicial cycles using multi-labeling in Wu’s formula. 2023. arXiv:2306.16951 [math.AT]. url:https://arxiv.org/abs/2306.16951

  6. [6]

    S´ ebastien Bubeck et al.Early science acceleration experiments with GPT-5. 2025. arXiv: 2511.16072 [cs.CL].url:https://arxiv.org/abs/2511.16072

  7. [7]

    Fran¸ cois Charton et al.PatternBoost: Constructions in Mathematics with a Little Help from AI. 2024. arXiv:2411.00566 [math.CO].url:https://arxiv.org/abs/2411. 00566. 10

  8. [8]

    Dawei Chen et al.Parity ofk-differentials in genus zero and one. 2026. arXiv:2602. 03722 [math.NT].url:https://arxiv.org/abs/2602.03722

  9. [9]

    Evan Chen et al.Fel’s Conjecture on Syzygies of Numerical Semigroups. 2026. arXiv: 2602.03716 [math.CO].url:https://arxiv.org/abs/2602.03716

  10. [10]

    An optimal on-line algorithm for k servers on trees

    Marek Chrobak and Lawrence L Larmore. “An optimal on-line algorithm for k servers on trees”. In:SIAM Journal on Computing20.1 (1991), pp. 144–148

  11. [11]

    New results on server problems

    Marek Chrobak et al. “New results on server problems”. In:SIAM Journal on Discrete Mathematics4.2 (1991), pp. 172–181

  12. [12]

    Christian Coester and Elias Koutsoupias.Towards the k-server conjecture: A unifying potential, pushing the frontier to the circle. 2021. arXiv:2102 . 10474 [cs.DS].url: https://arxiv.org/abs/2102.10474

  13. [13]

    Advancing mathematics by guiding human intuition with AI

    A. Davies et al. “Advancing mathematics by guiding human intuition with AI”. In: Nature600 (2021), pp. 70–74

  14. [14]

    Edgar Dobriban.Solving a Research Problem in Mathematical Statistics with AI As- sistance. 2025. arXiv:2511.18828 [math.ST].url:https://arxiv.org/abs/2511. 18828

  15. [15]

    Epoch AI

    Epoch AI.FrontierMath Open Problems. Epoch AI. 2026.url:https://epoch.ai/ frontiermath/open-problems(visited on 02/13/2026)

  16. [16]

    Tony Feng et al.Semi-Autonomous Mathematics Discovery with Gemini: A Case Study on the Erd˝ os Problems. 2026. arXiv:2601.22401 [cs.AI].url:https://arxiv.org/ abs/2601.22401

  17. [17]

    Tony Feng et al.Towards Autonomous Mathematics Research. 2026. arXiv:2602.10177 [cs.LG].url:https://arxiv.org/abs/2602.10177

  18. [18]

    Competitive k-server algorithms

    Amos Fiat, Yuval Rabani, and Yiftach Ravid. “Competitive k-server algorithms”. In: Journal of Computer and System Sciences48.3 (1994), pp. 410–428

  19. [19]

    Bogdan Georgiev et al.Mathematical exploration and discovery at scale. 2025. arXiv: 2511.02864 [cs.NE].url:https://arxiv.org/abs/2511.02864

  20. [20]

    Random Baselines for Simple Code Problems are Competitive with Code Evolution

    Yonatan Gideoni et al. “Random Baselines for Simple Code Problems are Competitive with Code Evolution”. In:NeurIPS 2025 Fourth Workshop on Deep Learning for Code. 2025.url:https://openreview.net/forum?id=rbVIpbmbTc

  21. [21]

    Elliot Glazer et al.FrontierMath: A Benchmark for Evaluating Advanced Mathematical Reasoning in AI. 2025. arXiv:2411.04872 [cs.AI].url:https://arxiv.org/abs/ 2411.04872

  22. [22]

    Alfredo Guevara et al.Single-minus gluon tree amplitudes are nonzero. 2026. arXiv: 2602.12176 [hep-th].url:https://arxiv.org/abs/2602.12176. 11

  23. [23]

    Zhiyi Huang and Hanwen Zhang.Deterministic 3-Server on a Circle and the Limitation of Canonical Potentials. 2022. arXiv:2205.08103 [cs.DS].url:https://arxiv.org/ abs/2205.08103

  24. [24]

    Jiachen Jiang, Tianyu Ding, and Zhihui Zhu.DeltaEvolve: Accelerating Scientific Dis- covery through Momentum-Driven Evolution. 2026. arXiv:2602.02919 [cs.AI].url: https://arxiv.org/abs/2602.02919

  25. [25]

    Thek-server problem

    Elias Koutsoupias. “Thek-server problem”. en. In:Comput. Sci. Rev.3.2 (May 2009), pp. 105–118

  26. [26]

    On thek-server conjecture

    Elias Koutsoupias and Christos H Papadimitriou. “On thek-server conjecture”. en. In: J. ACM42.5 (Sept. 1995), pp. 971–983

  27. [27]

    Guillaume Lample and Fran¸ cois Charton.Deep Learning for Symbolic Mathematics

  28. [28]

    arXiv:1912.01412 [cs.SC]

  29. [29]

    Robert Tjarko Lange, Yuki Imajuku, and Edoardo Cetin.ShinkaEvolve: Towards Open- Ended And Sample-Efficient Program Evolution. 2025. arXiv:2509 . 19349 [cs.CL]. url:https://arxiv.org/abs/2509.19349

  30. [30]

    Chris Lu et al.The AI Scientist: Towards Fully Automated Open-Ended Scientific Dis- covery. 2024. arXiv:2408.06292 [cs.AI].url:https://arxiv.org/abs/2408.06292

  31. [31]

    Journal of Algorithms6(1), 49–70 (Mar 1985),https://linkinghub.elsevier.com/retrieve/pii/0196677485900185

    Mark S Manasse, Lyle A McGeoch, and Daniel D Sleator. “Competitive algorithms for server problems”. In:Journal of Algorithms11.2 (1990), pp. 208–230.issn: 0196-6774. doi:https : / / doi . org / 10 . 1016 / 0196 - 6774(90 ) 90003 - W.url:https : / / www . sciencedirect.com/science/article/pii/019667749090003W

  32. [32]

    Ansh Nagda, Prabhakar Raghavan, and Abhradeep Thakurta.Reinforced Generation of Combinatorial Structures: Hardness of Approximation. 2025. arXiv:2509 . 18057 [cs.LG].url:https://arxiv.org/abs/2509.18057

  33. [33]

    Ansh Nagda, Prabhakar Raghavan, and Abhradeep Thakurta.Reinforced Generation of Combinatorial Structures: Ramsey Numbers. 2026. arXiv:2603.09172 [math.CO]. url:https://arxiv.org/abs/2603.09172

  34. [34]

    Alexander Novikov et al.AlphaEvolve: A coding agent for scientific and algorithmic discovery. 2025. arXiv:2506.13131 [cs.AI].url:https://arxiv.org/abs/2506. 13131

  35. [35]

    Dylan Peifer, Michael Stillman, and Daniel Halpern-Leistner.Learning selection strate- gies in Buchberger’s algorithm. 2020. arXiv:2005.01917 [cs.LG]

  36. [36]

    Mathematical discoveries from program search with large language models

    Bernardino Romera-Paredes et al. “Mathematical discoveries from program search with large language models”. en. In:Nature625.7995 (Jan. 2024), pp. 468–475

  37. [37]

    2025.url: https://github.com/algorithmicsuperintelligence/openevolve

    Asankhaya Sharma.OpenEvolve: an open-source evolutionary coding agent. 2025.url: https://github.com/algorithmicsuperintelligence/openevolve. 12

  38. [38]

    Nat Sothanaphan.Resolution of Erd˝ os Problem #728: a writeup of Aristotle’s Lean proof. 2026. arXiv:2601 . 07421 [math.NT].url:https : / / arxiv . org / abs / 2601 . 07421

  39. [39]

    Anja Surina et al.Algorithm Discovery With LLMs: Evolutionary Search Meets Rein- forcement Learning. 2025. arXiv:2504 . 05108 [cs.AI].url:https : / / arxiv . org / abs/2504.05108

  40. [40]

    Adam Zsolt Wagner.Constructions in combinatorics via neural networks. 2021. arXiv: 2104.14516 [math.CO]

  41. [41]

    Chunhui Wan et al.LoongFlow: Directed Evolutionary Search via a Cognitive Plan- Execute-Summarize Paradigm. 2025. arXiv:2512 . 24077 [cs.AI].url:https : / / arxiv.org/abs/2512.24077

  42. [42]

    Yiping Wang et al.ThetaEvolve: Test-time Learning on Open Problems. 2025. arXiv: 2511.23473 [cs.LG].url:https://arxiv.org/abs/2511.23473

  43. [43]

    Online Matching with Stochastic Rewards: Provable Better Bound via Adversarial Reinforcement Learning

    Qiankun Zhang et al. “Online Matching with Stochastic Rewards: Provable Better Bound via Adversarial Reinforcement Learning”. In:Proceedings of the 41st Interna- tional Conference on Machine Learning. Ed. by Ruslan Salakhutdinov et al. Vol. 235. Proceedings of Machine Learning Research. PMLR, 21–27 Jul 2024, pp. 59806–59826. url:https://proceedings.mlr.pr...

  44. [44]

    Tianshi Zheng et al.From Automation to Autonomy: A Survey on Large Language Models in Scientific Discovery. 2025. arXiv:2505 . 13259 [cs.CL].url:https : / / arxiv.org/abs/2505.13259. A Related Work Recent AI systems have shown growing promise in formal scientific domains, including both physics [4, 22] and mathematics [19, 14, 6, 16, 17, 37, 9, 8]. Alongs...

  45. [45]

    For allwand requestsr, Φ(w′)−Φ(w)≥∇(w,r), wherew′=T r(w)is the work function afterr,

  46. [46]

    "" Maps ‘config‘ (a tuple of point indices) to its index in the work-function vector, i.e., w(C) = wf[idx] where config_to_idx(C) = idx

    For allw, Φ(w)≤(c+ 1) min X∈C w(X) + const. Then WFA isc-competitive. 15 Proof.Using Lemma 1 and telescoping, OPT(ρ) + WFA(ρ)≤ ∑ i ∇(wi,r i+1)≤Φ(wt+1)−Φ(w0), and the second property yields the claim after absorbing constants. B.3 Work-Function Graph We first note that work functions are shift invariant in the following sense: Lemma 2(Shift-Invariance).For...