arxiv: 2604.07240 · v1 · submitted 2026-04-08 · 💻 cs.MS · cs.AI· cs.LG

Recognition: no theorem link

k-server-bench: Automating Potential Discovery for the k-Server Conjecture

Emmanuel Abb\'e, Etienne Bamas, Kirill Brilliantov

Authors on Pith no claims yet

Pith reviewed 2026-05-10 17:45 UTC · model grok-4.3

classification 💻 cs.MS cs.AIcs.LG

keywords k-server conjecturepotential functionsautomated discoverybenchmarkcompetitive analysislinear inequalitiesonline algorithms

0 comments

The pith

This paper introduces a benchmark that casts the k-server conjecture as a search for a potential function satisfying a graph-structured system of linear inequalities.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The authors formulate the k-server conjecture's potential requirement as a large collection of simple linear inequalities arranged in a graph structure. Any function that meets every inequality supplies strong evidence toward the conjecture's special case even though it does not by itself prove the claim. Experiments show that current agentic code methods fully solve the resolved k=3 instances and cut the number of violations below those of existing potentials in the open k=4 circle case, though they do not yet eliminate all violations. The resulting task also functions as a benchmark for automated mathematical discovery because it resists early saturation and distinguishes random baselines from stronger methods more clearly than prior open-ended coding challenges. The developed tooling further lets researchers test fresh hypotheses on the k-server problem directly.

Core claim

The paper's central contribution is the definition of k-server-bench, a code-based challenge in which an automated agent must produce a potential function that satisfies every inequality in a fixed, graph-structured linear system derived from the k-server conjecture. The evaluation is sound but incomplete: any violated inequality immediately refutes the candidate, while satisfying the entire system does not constitute a proof yet still yields strong evidence and, to date, no known potential passes the open k=4 circle formulation. Experiments confirm that agentic methods solve nontrivial k=3 instances completely and reduce violations on k=4 relative to prior potentials without resolving the k

What carries the argument

The graph-structured system of linear inequalities that any candidate potential function must satisfy for the k-server conjecture.

If this is right

A candidate that passes the full k=4 system would constitute strong evidence for the conjecture and could be paired with a proof to yield a substantial theoretical result.
The benchmark supplies concrete tooling that lets k-server researchers test new potential hypotheses systematically.
Current methods already solve the entire k=3 regime and measurably improve on known potentials for k=4.
The task separates naive random baselines from sophisticated agents more sharply than earlier open-ended discovery benchmarks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same inequality-system approach could be adapted to other open problems in online algorithms whose proofs rely on potential functions.
Integrating the benchmark with symbolic or theorem-proving tools might convert a passing candidate into a complete proof.
The formulation may surface previously unnoticed structural constraints on possible potentials for the k-server problem.

Load-bearing premise

The chosen graph-structured linear inequalities faithfully encode the necessary conditions that every valid potential function for the k-server conjecture must obey.

What would settle it

An automated agent or exhaustive search that produces a potential satisfying every inequality in the k=4 circle formulation would confirm the benchmark can surface useful candidates; conversely, a proof that no such function exists under the given inequalities would show the formulation is too strict.

Figures

Figures reproduced from arXiv: 2604.07240 by Emmanuel Abb\'e, Etienne Bamas, Kirill Brilliantov.

**Figure 1.** Figure 1: 10 independent runs of ShinkaEvolve, Best-of-N, LoongFlow per task design: (i) no search, just suggest a canonical potential, (ii) with search over canonical potentials, (iii) with search without canonical potential hint. All methods had the same starting conditions per task design and had 100 evaluations cap [PITH_FULL_IMAGE:figures/full_fig_p026_1.png] view at source ↗

read the original abstract

We introduce a code-based challenge for automated, open-ended mathematical discovery based on the $k$-server conjecture, a central open problem in competitive analysis. The task is to discover a potential function satisfying a large graph-structured system of simple linear inequalities. The resulting evaluation procedure is sound but incomplete: any violated inequality definitively refutes a candidate, whereas satisfying all inequalities does not by itself constitute a proof of the corresponding conjecture's special case. Nevertheless, a candidate that passes all constraints would be strong evidence toward a valid proof and, to the best of our knowledge, no currently known potential achieves this under our formulation in the open $k=4$ circle case. As such, a successful candidate would already be an interesting contribution to the $k$-server conjecture, and could become a substantial theoretical result when paired with a full proof. Experiments on the resolved $k=3$ regime show that current agentic methods can solve nontrivial instances, and in the open $k=4$ regime they reduce the number of violations relative to existing potentials without fully resolving the task. Taken together, these results suggest that the task is challenging but plausibly within reach of current methods. Beyond its relevance to the $k$-server community, where the developed tooling enables researchers to test new hypotheses and potentially improve on the current record, the task also serves as a useful \emph{benchmark} for developing code-based discovery agents. In particular, our $k=3$ results show that it mitigates important limitations of existing open-ended code-based benchmarks, including early saturation and the weak separation between naive random baselines and more sophisticated methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This turns the k-server potential search into a clean, machine-checkable benchmark that recovers known solutions for k=3 and shows agents can cut violations on the open k=4 case.

read the letter

The paper's core move is to encode the search for a k-server potential as an open-ended task of finding a function that satisfies a fixed set of linear inequalities over a graph of configurations. Any candidate that breaks one of those inequalities is immediately invalid, which makes the test sound for rejection. They run this on the resolved k=3 case and recover the known working potentials, then apply it to the open k=4 circle instance where current hand-designed potentials still leave violations. Agent-based search reduces those violations relative to the baselines without eliminating them all. That separation is the practical result they report, and it is new in this specific form for the k-server literature. The formulation itself is drawn directly from the existing competitive-analysis setup without added parameters or circular assumptions, which keeps the burden low. The main limitation is the one they state plainly: passing the inequalities is necessary but not sufficient for a full proof, so the benchmark supplies evidence and a discovery tool rather than a resolution. Experiments stay modest in scale, but they are enough to show the task is neither trivial nor immediately saturated. Readers working on online algorithms will find a reusable way to test new potential ideas, while people building code-based discovery agents get a benchmark that avoids the early saturation seen in some prior open-ended tasks. The work is grounded enough and the evaluation reproducible enough that it merits a serious referee, even if the eventual outcome is likely to be a request for more extensive runs or tighter integration with existing k-server proofs.

Referee Report

0 major / 2 minor

Summary. The paper introduces k-server-bench, a code-based benchmark for automated discovery of potential functions satisfying a graph-structured system of linear inequalities derived from the k-server conjecture. The evaluation procedure is sound (any violation refutes a candidate) but incomplete (satisfaction provides evidence but not a proof). Experiments demonstrate that agentic methods recover known solutions on resolved k=3 instances and reduce violations relative to existing potentials on the open k=4 circle case, where no known potential satisfies the formulation. The work positions the benchmark as useful both for advancing the k-server conjecture and for evaluating open-ended code-based discovery agents, addressing limitations such as early saturation in prior benchmarks.

Significance. If the results hold, the benchmark supplies a concrete, falsifiable tool that enables systematic testing of candidate potentials for the k-server problem, with a passing candidate constituting strong evidence toward a proof in the open k=4 case. The k=3 experiments confirm recovery of known solutions and demonstrate clear separation between naive and sophisticated methods on nontrivial instances, mitigating early saturation issues in existing discovery benchmarks. The absence of free parameters or self-referential fitting in the inequality system strengthens the claim that the benchmark faithfully captures necessary conditions from the literature.

minor comments (2)

[§3.2] §3.2: the precise construction of the graph for the k=4 circle case (number of nodes, edge weights) could be stated explicitly with a small example to aid reproducibility without requiring inspection of the released code.
[Table 2] Table 2: the column reporting number of violations for agent runs on k=4 would benefit from an additional row or note indicating the best-known hand-crafted potential for direct comparison.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive assessment of the work and for recommending acceptance. No major comments were raised, so we have no specific points requiring response or revision.

Circularity Check

0 steps flagged

No significant circularity: benchmark defined from external literature

full rationale

The paper defines a sound-but-incomplete benchmark consisting of graph-structured linear inequalities drawn from the established k-server conjecture literature. No parameter is fitted to data and then renamed as a prediction; the k=3 results rest on the independently resolved case rather than any self-derived equation; and the central claim is limited to the benchmark's utility for agentic discovery without asserting a proof of the open k=4 case. No load-bearing step reduces by construction to the paper's own inputs or self-citations.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper introduces a new benchmark task built on the existing k-server conjecture without introducing fitted parameters, new mathematical entities, or additional axioms beyond the standard background of competitive analysis.

pith-pipeline@v0.9.0 · 5608 in / 1247 out tokens · 55690 ms · 2026-05-10T17:45:18.601211+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

46 extracted references · 30 canonical work pages · 3 internal anchors

[1]

Mohammed Abouzaid et al.First Proof. 2026. arXiv:2602.05192 [cs.AI].url:https: //arxiv.org/abs/2602.05192

work page arXiv 2026
[2]

On the competitive ratio of the work function algorithm for the k-server problem

Yair Bartal and Elias Koutsoupias. “On the competitive ratio of the work function algorithm for the k-server problem”. In:Theoretical computer science324.2-3 (2004), pp. 337–345

2004
[3]

The 3-server problem in the plane

Wolfgang W Bein, Marek Chrobak, and Lawrence L Larmore. “The 3-server problem in the plane”. In:Theoretical Computer Science289.1 (2002), pp. 335–354

2002
[4]

Brenner, Vincent Cohen-Addad, and David Woodruff.Solving an Open Problem in Theoretical Physics using AI-Assisted Discovery

Michael P. Brenner, Vincent Cohen-Addad, and David Woodruff.Solving an Open Problem in Theoretical Physics using AI-Assisted Discovery. 2026. arXiv:2603.04735 [cs.AI].url:https://arxiv.org/abs/2603.04735

work page arXiv 2026
[5]

Kirill Brilliantov et al.Applying language models to algebraic topology: generating sim- plicial cycles using multi-labeling in Wu’s formula. 2023. arXiv:2306.16951 [math.AT]. url:https://arxiv.org/abs/2306.16951

work page arXiv 2023
[6]

S´ ebastien Bubeck et al.Early science acceleration experiments with GPT-5. 2025. arXiv: 2511.16072 [cs.CL].url:https://arxiv.org/abs/2511.16072

work page arXiv 2025
[7]

Fran¸ cois Charton et al.PatternBoost: Constructions in Mathematics with a Little Help from AI. 2024. arXiv:2411.00566 [math.CO].url:https://arxiv.org/abs/2411. 00566. 10

work page arXiv 2024
[8]

Dawei Chen et al.Parity ofk-differentials in genus zero and one. 2026. arXiv:2602. 03722 [math.NT].url:https://arxiv.org/abs/2602.03722

work page arXiv 2026
[9]

Evan Chen et al.Fel’s Conjecture on Syzygies of Numerical Semigroups. 2026. arXiv: 2602.03716 [math.CO].url:https://arxiv.org/abs/2602.03716

work page arXiv 2026
[10]

An optimal on-line algorithm for k servers on trees

Marek Chrobak and Lawrence L Larmore. “An optimal on-line algorithm for k servers on trees”. In:SIAM Journal on Computing20.1 (1991), pp. 144–148

1991
[11]

New results on server problems

Marek Chrobak et al. “New results on server problems”. In:SIAM Journal on Discrete Mathematics4.2 (1991), pp. 172–181

1991
[12]

Christian Coester and Elias Koutsoupias.Towards the k-server conjecture: A unifying potential, pushing the frontier to the circle. 2021. arXiv:2102 . 10474 [cs.DS].url: https://arxiv.org/abs/2102.10474

work page arXiv 2021
[13]

Advancing mathematics by guiding human intuition with AI

A. Davies et al. “Advancing mathematics by guiding human intuition with AI”. In: Nature600 (2021), pp. 70–74

2021
[14]

Edgar Dobriban.Solving a Research Problem in Mathematical Statistics with AI As- sistance. 2025. arXiv:2511.18828 [math.ST].url:https://arxiv.org/abs/2511. 18828

work page arXiv 2025
[15]

Epoch AI

Epoch AI.FrontierMath Open Problems. Epoch AI. 2026.url:https://epoch.ai/ frontiermath/open-problems(visited on 02/13/2026)

2026
[16]

Tony Feng et al.Semi-Autonomous Mathematics Discovery with Gemini: A Case Study on the Erd˝ os Problems. 2026. arXiv:2601.22401 [cs.AI].url:https://arxiv.org/ abs/2601.22401

work page arXiv 2026
[17]

Tony Feng et al.Towards Autonomous Mathematics Research. 2026. arXiv:2602.10177 [cs.LG].url:https://arxiv.org/abs/2602.10177

work page arXiv 2026
[18]

Competitive k-server algorithms

Amos Fiat, Yuval Rabani, and Yiftach Ravid. “Competitive k-server algorithms”. In: Journal of Computer and System Sciences48.3 (1994), pp. 410–428

1994
[19]

Bogdan Georgiev et al.Mathematical exploration and discovery at scale. 2025. arXiv: 2511.02864 [cs.NE].url:https://arxiv.org/abs/2511.02864

work page arXiv 2025
[20]

Random Baselines for Simple Code Problems are Competitive with Code Evolution

Yonatan Gideoni et al. “Random Baselines for Simple Code Problems are Competitive with Code Evolution”. In:NeurIPS 2025 Fourth Workshop on Deep Learning for Code. 2025.url:https://openreview.net/forum?id=rbVIpbmbTc

2025
[21]

Elliot Glazer et al.FrontierMath: A Benchmark for Evaluating Advanced Mathematical Reasoning in AI. 2025. arXiv:2411.04872 [cs.AI].url:https://arxiv.org/abs/ 2411.04872

work page arXiv 2025
[22]

Alfredo Guevara et al.Single-minus gluon tree amplitudes are nonzero. 2026. arXiv: 2602.12176 [hep-th].url:https://arxiv.org/abs/2602.12176. 11

work page arXiv 2026
[23]

Zhiyi Huang and Hanwen Zhang.Deterministic 3-Server on a Circle and the Limitation of Canonical Potentials. 2022. arXiv:2205.08103 [cs.DS].url:https://arxiv.org/ abs/2205.08103

work page arXiv 2022
[24]

Jiachen Jiang, Tianyu Ding, and Zhihui Zhu.DeltaEvolve: Accelerating Scientific Dis- covery through Momentum-Driven Evolution. 2026. arXiv:2602.02919 [cs.AI].url: https://arxiv.org/abs/2602.02919

work page arXiv 2026
[25]

Thek-server problem

Elias Koutsoupias. “Thek-server problem”. en. In:Comput. Sci. Rev.3.2 (May 2009), pp. 105–118

2009
[26]

On thek-server conjecture

Elias Koutsoupias and Christos H Papadimitriou. “On thek-server conjecture”. en. In: J. ACM42.5 (Sept. 1995), pp. 971–983

1995
[27]

Guillaume Lample and Fran¸ cois Charton.Deep Learning for Symbolic Mathematics
[28]

arXiv:1912.01412 [cs.SC]

work page arXiv 1912
[29]

Robert Tjarko Lange, Yuki Imajuku, and Edoardo Cetin.ShinkaEvolve: Towards Open- Ended And Sample-Efficient Program Evolution. 2025. arXiv:2509 . 19349 [cs.CL]. url:https://arxiv.org/abs/2509.19349

work page arXiv 2025
[30]

Chris Lu et al.The AI Scientist: Towards Fully Automated Open-Ended Scientific Dis- covery. 2024. arXiv:2408.06292 [cs.AI].url:https://arxiv.org/abs/2408.06292

work page internal anchor Pith review arXiv 2024
[31]

Journal of Algorithms6(1), 49–70 (Mar 1985),https://linkinghub.elsevier.com/retrieve/pii/0196677485900185

Mark S Manasse, Lyle A McGeoch, and Daniel D Sleator. “Competitive algorithms for server problems”. In:Journal of Algorithms11.2 (1990), pp. 208–230.issn: 0196-6774. doi:https : / / doi . org / 10 . 1016 / 0196 - 6774(90 ) 90003 - W.url:https : / / www . sciencedirect.com/science/article/pii/019667749090003W

work page arXiv 1990
[32]

Ansh Nagda, Prabhakar Raghavan, and Abhradeep Thakurta.Reinforced Generation of Combinatorial Structures: Hardness of Approximation. 2025. arXiv:2509 . 18057 [cs.LG].url:https://arxiv.org/abs/2509.18057

work page arXiv 2025
[33]

Ansh Nagda, Prabhakar Raghavan, and Abhradeep Thakurta.Reinforced Generation of Combinatorial Structures: Ramsey Numbers. 2026. arXiv:2603.09172 [math.CO]. url:https://arxiv.org/abs/2603.09172

work page internal anchor Pith review Pith/arXiv arXiv 2026
[34]

Alexander Novikov et al.AlphaEvolve: A coding agent for scientific and algorithmic discovery. 2025. arXiv:2506.13131 [cs.AI].url:https://arxiv.org/abs/2506. 13131

work page internal anchor Pith review arXiv 2025
[35]

Dylan Peifer, Michael Stillman, and Daniel Halpern-Leistner.Learning selection strate- gies in Buchberger’s algorithm. 2020. arXiv:2005.01917 [cs.LG]

work page arXiv 2020
[36]

Mathematical discoveries from program search with large language models

Bernardino Romera-Paredes et al. “Mathematical discoveries from program search with large language models”. en. In:Nature625.7995 (Jan. 2024), pp. 468–475

2024
[37]

2025.url: https://github.com/algorithmicsuperintelligence/openevolve

Asankhaya Sharma.OpenEvolve: an open-source evolutionary coding agent. 2025.url: https://github.com/algorithmicsuperintelligence/openevolve. 12

2025
[38]

Nat Sothanaphan.Resolution of Erd˝ os Problem #728: a writeup of Aristotle’s Lean proof. 2026. arXiv:2601 . 07421 [math.NT].url:https : / / arxiv . org / abs / 2601 . 07421

2026
[39]

Anja Surina et al.Algorithm Discovery With LLMs: Evolutionary Search Meets Rein- forcement Learning. 2025. arXiv:2504 . 05108 [cs.AI].url:https : / / arxiv . org / abs/2504.05108

work page arXiv 2025
[40]

Adam Zsolt Wagner.Constructions in combinatorics via neural networks. 2021. arXiv: 2104.14516 [math.CO]

work page arXiv 2021
[41]

Chunhui Wan et al.LoongFlow: Directed Evolutionary Search via a Cognitive Plan- Execute-Summarize Paradigm. 2025. arXiv:2512 . 24077 [cs.AI].url:https : / / arxiv.org/abs/2512.24077

work page arXiv 2025
[42]

Yiping Wang et al.ThetaEvolve: Test-time Learning on Open Problems. 2025. arXiv: 2511.23473 [cs.LG].url:https://arxiv.org/abs/2511.23473

work page arXiv 2025
[43]

Online Matching with Stochastic Rewards: Provable Better Bound via Adversarial Reinforcement Learning

Qiankun Zhang et al. “Online Matching with Stochastic Rewards: Provable Better Bound via Adversarial Reinforcement Learning”. In:Proceedings of the 41st Interna- tional Conference on Machine Learning. Ed. by Ruslan Salakhutdinov et al. Vol. 235. Proceedings of Machine Learning Research. PMLR, 21–27 Jul 2024, pp. 59806–59826. url:https://proceedings.mlr.pr...

2024
[44]

Tianshi Zheng et al.From Automation to Autonomy: A Survey on Large Language Models in Scientific Discovery. 2025. arXiv:2505 . 13259 [cs.CL].url:https : / / arxiv.org/abs/2505.13259. A Related Work Recent AI systems have shown growing promise in formal scientific domains, including both physics [4, 22] and mathematics [19, 14, 6, 16, 17, 37, 9, 8]. Alongs...

work page arXiv 2025
[45]

For allwand requestsr, Φ(w′)−Φ(w)≥∇(w,r), wherew′=T r(w)is the work function afterr,
[46]

"" Maps ‘config‘ (a tuple of point indices) to its index in the work-function vector, i.e., w(C) = wf[idx] where config_to_idx(C) = idx

For allw, Φ(w)≤(c+ 1) min X∈C w(X) + const. Then WFA isc-competitive. 15 Proof.Using Lemma 1 and telescoping, OPT(ρ) + WFA(ρ)≤ ∑ i ∇(wi,r i+1)≤Φ(wt+1)−Φ(w0), and the second property yields the claim after absorbing constants. B.3 Work-Function Graph We first note that work functions are shift invariant in the following sense: Lemma 2(Shift-Invariance).For...

work page arXiv 2020