pith. machine review for the scientific record. sign in

arxiv: 2605.06900 · v1 · submitted 2026-05-07 · 💻 cs.DS · cs.LG

Recognition: 3 theorem links

· Lean Theorem

Accelerated Relax-and-Round for Concave Coverage Problems

Matthew Fahrbach, Mehraneh Liaee, Morteza Zadimoghaddam

Pith reviewed 2026-05-11 00:47 UTC · model grok-4.3

classification 💻 cs.DS cs.LG
keywords concave coveragerelax-and-roundapproximation algorithmsaccelerated gradient methodhypersimplex roundinglogarithmic rewardmaximum coveragemulti-coverage
0
0 comments X

The pith

An accelerated gradient method with hypersimplex rounding achieves a 0.827-approximation for logarithmic concave coverage in near-linear time.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a faster algorithm for approximating concave coverage problems, which extend maximum coverage by rewarding sets with general concave functions of their coverage. It replaces the standard linear programming relaxation with a projected accelerated gradient method on a smooth surrogate objective. A specialized rounding procedure is then applied to the resulting fractional point in the hypersimplex. This combination yields both improved running time of order mn over epsilon and tight approximation guarantees, including 0.827 for the logarithmic reward function. Experiments on maximum multi-coverage instances show the method outperforming approaches that rely on conventional linear programming solvers.

Core claim

We present an accelerated relax-and-round algorithm for concave coverage problems. We replace the LP relaxation step with a projected accelerated gradient method applied to a smooth surrogate objective to achieve a Õ(mn ε^{-1}) running time. We use a specialized rounding scheme for the hypersimplex. We prove tight approximation ratios for new reward functions, including a 0.827-approximation for the logarithmic reward φ(x) = log(1 + x).

What carries the argument

The projected accelerated gradient method on the smooth surrogate objective paired with the hypersimplex rounding scheme that combines decomposition and randomized swap rounding to preserve approximation quality.

If this is right

  • Concave coverage problems admit approximation algorithms whose running time scales linearly with input size up to the accuracy parameter.
  • The logarithmic reward function φ(x) = log(1 + x) admits a 0.827-approximation via the new rounding scheme.
  • The method produces higher-quality solutions than standard LP solvers on both synthetic and real-world graph instances of maximum multi-coverage.
  • Tight approximation ratios are established for additional concave reward functions beyond the logarithmic case.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The acceleration technique may extend to other combinatorial problems that currently rely on linear programming relaxations for concave objectives.
  • Practical scalability improves for large coverage instances such as influence maximization or sensor placement where logarithmic rewards appear.
  • Alternative surrogate functions could be designed if the current smoothness requirement proves limiting for certain concave rewards.

Load-bearing premise

The surrogate objective is sufficiently smooth and the hypersimplex rounding preserves the claimed approximation ratios for the specific concave reward functions considered.

What would settle it

An instance of logarithmic concave coverage where the algorithm's output value falls below 0.827 times the optimal coverage value, or runtime measurements on inputs of increasing size that violate the linear dependence on 1/ε.

Figures

Figures reproduced from arXiv: 2605.06900 by Matthew Fahrbach, Mehraneh Liaee, Morteza Zadimoghaddam.

Figure 1
Figure 1. Figure 1: Examples of LSE smoothing. 7 [PITH_FULL_IMAGE:figures/full_fig_p007_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Comparison of relax-and-round and greedy algorithms on hard synthetic graphs. 6.2 SNAP graphs Datasets. We build c-multi-coverage instances from SNAP graphs [29]. For a graph G = (V, E), let G′ = (L, R, E′ ) where L = R = V and E′ = {(i, i) : i ∈ V } ∪ {(i, j) : {i, j} ∈ E or {j, i} ∈ E}. The facebook graph has n = 4,039 and m = 180,507; dblp has n = 317,080 and m = 2,416,812. We study c-multi-coverage on … view at source ↗
Figure 3
Figure 3. Figure 3: Plots of the Poisson concavity ratio αφ(x) = EX∼Pois(x) [φ(X)]/φ(x) for c-multi-coverage functions φ(x) = min(x, c). Using the identity in (14), it is equivalent to show that xPr(X ≤ c − 1) < E[min(X, c)] ⇐⇒ xPr(X ≤ c − 1) < xPr(X ≤ c − 2) + cPr(X ≥ c) ⇐⇒ xPr(X = c − 1) < cPr(X ≥ c). Observe that xPr(X = c − 1) = x · x c−1 e −x (c − 1)! = c · x c e −x c! = cPr(X = c). It follows that xPr(X = c − 1) = cPr(X… view at source ↗
Figure 4
Figure 4. Figure 4: Plots of αφ(x; c, β) = EX∼Pois(x) [φ(X; c, β)]/φ(x; c, β) for c = 1 and different β ∈ [0, 1]. Case 1 (x ≤ c): If x ≤ c, then αφ(x; c, β) = βx + (1 − β)[xPr(X ≤ c − 2) + cPr(X ≥ c)] x = β + (1 − β)f(x), where f(x) is defined in (15). We showed that f(x) decreases for x ≤ c in the proof of Lemma D.4, so then αφ(x; c, β) is decreasing for x ∈ (0, c]. Case 2 (x ≥ c): If x ≥ c, we need to analyze h(x) = βx + (1… view at source ↗
Figure 5
Figure 5. Figure 5: Construction of the optimal solution (submatrices [PITH_FULL_IMAGE:figures/full_fig_p042_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Cyclic coverage of nodes by greedy sets yi . bound on it) for which i X∗ i=1  k ℓ + 2 − i  ≥ k. For 1 ≤ i ′ ≤ ℓ + 1, we have i X′ i=1  k ℓ + 2 − i  ≥   i X′ i=1 k ℓ + 2 − i   − i ′ =   i X′−1 i=1 k ℓ + 2 − i   + k ℓ + 2 − i ′ − i ′ ≥ i X′−1 i=1 k ℓ + 2 − i = k X ℓ+1 j=ℓ+3−i ′ 1 j ≥ k Z ℓ+2 x=ℓ+3−i ′ 1 x dx = k log ℓ + 2 ℓ + 3 − i ′  , where the second inequality, i.e., k ℓ+2−i ′ − i ′ ≥ 0, ho… view at source ↗
read the original abstract

We present an accelerated relax-and-round algorithm for concave coverage problems, which generalize the classic maximum coverage problem. Building on the relax-and-round framework of Barman et al. [STACS 2021], we propose two significant improvements. First, we replace the linear programming (LP) relaxation step with a projected accelerated gradient method applied to a smooth surrogate objective to achieve a $\widetilde{O}(mn \varepsilon^{-1})$ running time. Second, we use a specialized rounding scheme for the hypersimplex that combines the Carath\'eodory decomposition algorithm in Karalias et al. [NeurIPS 2025] with randomized swap rounding of Chekuri et al. [FOCS 2010]. We prove tight approximation ratios for new reward functions, including a $0.827$-approximation for the logarithmic reward $\varphi(x) = \log(1 + x)$. Finally, we conduct maximum multi-coverage experiments on synthetic and real-world graphs, demonstrating that our algorithm outperforms approaches that use state-of-the-art LP solvers.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript presents an accelerated relax-and-round algorithm for concave coverage problems, generalizing maximum coverage. It replaces the LP relaxation of Barman et al. with projected accelerated gradient descent on a smooth surrogate objective, yielding Õ(mn ε^{-1}) runtime. A hypersimplex rounding procedure is introduced that combines the Carathéodory decomposition of Karalias et al. with randomized swap rounding of Chekuri et al. Tight approximation ratios are claimed for several new concave reward functions, including a 0.827-approximation for φ(x) = log(1 + x). Experiments on synthetic and real-world graphs for maximum multi-coverage demonstrate practical speedups over LP-based baselines.

Significance. If the approximation and runtime claims hold, the work supplies a practically faster method for a broad class of concave coverage problems while retaining strong theoretical guarantees. The derivation of tight constants for previously unanalyzed reward functions (especially the logarithmic case) meaningfully extends the relax-and-round toolkit. The combination of accelerated first-order methods with advanced rounding schemes is technically substantive, and the experimental section provides concrete evidence of scalability gains. The paper correctly credits its foundational citations and supplies reproducible experimental protocols.

major comments (2)
  1. [§4] §4 (Approximation Analysis), proof of the 0.827 ratio for φ(x) = log(1 + x): the argument must explicitly compose the additive/multiplicative error from projected accelerated gradient descent (controlled by ε) with the expectation guarantee of the Carathéodory-plus-swap-rounding procedure. It is unclear whether the final ratio remains exactly 0.827 or degrades by an ε-dependent term that would require a separate limit argument as ε → 0.
  2. [§3.2] §3.2 (Surrogate Objective), smoothness and strong-convexity parameters: the projected accelerated gradient method is invoked with the standard Õ(1/√ε) iteration bound, but the specific surrogate for concave coverage must be shown to satisfy the required Lipschitz and smoothness constants uniformly over the hypersimplex; otherwise the claimed Õ(mn ε^{-1}) runtime does not follow directly.
minor comments (2)
  1. [§2] The definition of the hypersimplex and the precise statement of the Carathéodory decomposition step should be restated in §2 for self-contained reading, rather than relying solely on the Karalias et al. citation.
  2. [Experiments] Table 1 and Figure 2: axis labels and legend entries should explicitly indicate whether running times include the rounding phase or only the optimization phase.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive review. We appreciate the positive assessment of the work's significance and address each major comment below, indicating the revisions we plan to incorporate.

read point-by-point responses
  1. Referee: [§4] §4 (Approximation Analysis), proof of the 0.827 ratio for φ(x) = log(1 + x): the argument must explicitly compose the additive/multiplicative error from projected accelerated gradient descent (controlled by ε) with the expectation guarantee of the Carathéodory-plus-swap-rounding procedure. It is unclear whether the final ratio remains exactly 0.827 or degrades by an ε-dependent term that would require a separate limit argument as ε → 0.

    Authors: We agree that the error composition must be stated explicitly. The projected accelerated gradient descent produces an additive ε-approximation to the surrogate objective value. The subsequent Carathéodory-plus-swap-rounding step then yields an expectation that is at least 0.827 times the (approximate) surrogate value for the logarithmic reward. In the revised manuscript we will insert a short lemma in §4 that bounds the total loss: the overall guarantee is 0.827 − O(ε). Consequently, for any fixed δ > 0 one may choose ε = Θ(δ) to obtain a (0.827 − δ)-approximation in Õ(mn/δ) time. The claimed constant 0.827 is therefore tight in the limit as ε → 0, which is the standard interpretation for tunable-accuracy relax-and-round algorithms. We will also update the abstract and introduction to reflect this precise statement. revision: yes

  2. Referee: [§3.2] §3.2 (Surrogate Objective), smoothness and strong-convexity parameters: the projected accelerated gradient method is invoked with the standard Õ(1/√ε) iteration bound, but the specific surrogate for concave coverage must be shown to satisfy the required Lipschitz and smoothness constants uniformly over the hypersimplex; otherwise the claimed Õ(mn ε^{-1}) runtime does not follow directly.

    Authors: We thank the referee for highlighting this gap. The surrogate objective in §3.2 is a quadratic smoothing of the concave coverage function whose gradient is evaluated by a single matrix-vector product over the incidence matrix. In the revision we will add an explicit lemma proving that both the Lipschitz constant of the gradient and the smoothness parameter are bounded by constants that depend only on the concavity modulus of φ and the maximum set size; they are independent of m and n. With these uniform bounds the standard accelerated-gradient iteration count Õ(√(β/ε)) immediately yields the claimed Õ(mn ε^{-1}) total runtime after multiplying by the O(mn) cost of each gradient step. The added lemma will appear immediately after the definition of the surrogate. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in derivation chain

full rationale

The paper extends the relax-and-round framework of Barman et al. [STACS 2021] by substituting LP relaxation with projected accelerated gradient descent on a smooth surrogate objective and combining Carathéodory decomposition (Karalias et al. [NeurIPS 2025]) with randomized swap rounding (Chekuri et al. [FOCS 2010]) for the hypersimplex. It then proves tight approximation ratios for new concave reward functions, including the explicit 0.827 bound for φ(x) = log(1 + x). These steps rely on external prior results and independent mathematical analysis of the surrogate-plus-rounding composition; no equation or claim reduces by construction to a parameter fitted inside the paper, a self-citation chain, or a renamed internal quantity. The derivation remains self-contained against the cited external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on standard properties of concave functions, smoothness of the surrogate, and the correctness of the cited rounding primitives; no new free parameters or invented entities are introduced in the abstract.

axioms (2)
  • domain assumption The reward function φ is concave and the surrogate objective is smooth.
    Required for the projected accelerated gradient method to converge at the stated rate.
  • domain assumption The hypersimplex rounding scheme preserves the approximation ratio for the chosen concave rewards.
    Invoked when combining Carathéodory decomposition with randomized swap rounding.

pith-pipeline@v0.9.0 · 5486 in / 1294 out tokens · 53745 ms · 2026-05-11T00:47:48.372654+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Reference graph

Works this paper leans on

47 extracted references · 4 canonical work pages

  1. [1]

    Ageev and Maxim I

    Alexander A. Ageev and Maxim I. Sviridenko. Pipage rounding: A new method of constructing algorithms with proven performance guarantee.Journal of Combinatorial Optimization, 8: 307–328, 2004. 6, 11

  2. [2]

    More asymmetry yields faster matrix multiplication

    Josh Alman, Ran Duan, Virginia Vassilevska Williams, Yinzhan Xu, Zixuan Xu, and Renfei Zhou. More asymmetry yields faster matrix multiplication. InProceedings of the 2025 Annual ACM-SIAM Symposium on Discrete Algorithms, pages 2005–2039. SIAM, 2025. 7, 18

  3. [3]

    Fast projection onto the capped simplex with applications to sparse regression in bioinformatics.Advances in Neural Information Processing Systems, 34:9990–9999, 2021

    Man Shun Ang, Jianzhu Ma, Nianjun Liu, Kun Huang, and Yijie Wang. Fast projection onto the capped simplex with applications to sparse regression in bioinformatics.Advances in Neural Information Processing Systems, 34:9990–9999, 2021. 25

  4. [4]

    Submodular maximization via gradient ascent: The case of deep submodular functions.Advances in Neural Information Processing Systems, 31, 2018

    Wenruo Bai, William Stafford Noble, and Jeff A Bilmes. Submodular maximization via gradient ascent: The case of deep submodular functions.Advances in Neural Information Processing Systems, 31, 2018. 5

  5. [5]

    An exponential speedup in parallel running time for submodular maximization without loss in approximation

    Eric Balkanski, Aviad Rubinstein, and Yaron Singer. An exponential speedup in parallel running time for submodular maximization without loss in approximation. InProceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 283–302. SIAM,

  6. [6]

    The power of randomization: Distributed submodular maximization on massive datasets

    Rafael Barbosa, Alina Ene, Huy Nguyen, and Justin Ward. The power of randomization: Distributed submodular maximization on massive datasets. InInternational Conference on Machine Learning, pages 1236–1244. PMLR, 2015. 4

  7. [7]

    Tight approximation guarantees for concave coverage problems

    Siddharth Barman, Omar Fawzi, and Paul Fermé. Tight approximation guarantees for concave coverage problems. In38th International Symposium on Theoretical Aspects of Computer Science, volume 187, pages 9:1–9:17. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2021. 3, 4, 5, 6, 7, 11, 12, 13, 18, 19

  8. [8]

    Tight approxima- tion bounds for maximum multi-coverage.Mathematical Programming, 192(1):443–476, 2022

    Siddharth Barman, Omar Fawzi, Suprovat Ghoshal, and Emirhan Gürpınar. Tight approxima- tion bounds for maximum multi-coverage.Mathematical Programming, 192(1):443–476, 2022. 3, 4, 12, 13, 36, 40, 44

  9. [9]

    A fast iterative shrinkage-thresholding algorithm for linear inverse problems.SIAM Journal on Imaging Sciences, 2(1):183–202, 2009

    Amir Beck and Marc Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems.SIAM Journal on Imaging Sciences, 2(1):183–202, 2009. 10, 27

  10. [10]

    Optimizing instance selection for statistical machine translation with feature decay algorithms.IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23(2):339–350, 2014

    Ergun Biçici and Deniz Yuret. Optimizing instance selection for statistical machine translation with feature decay algorithms.IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23(2):339–350, 2014. 3 14

  11. [11]

    Submodularity in machine learning and artificial intelligence.ArXiv, abs/2202.00132, 2022

    Jeff Bilmes. Submodularity in machine learning and artificial intelligence.arXiv preprint arXiv:2202.00132, 2022. 5

  12. [12]

    Cambridge University Press,

    Stephen Boyd and Lieven Vandenberghe.Convex Optimization. Cambridge University Press,

  13. [13]

    Dependent randomized rounding via exchange properties of combinatorial structures

    Chandra Chekuri, Jan Vondrák, and Rico Zenklusen. Dependent randomized rounding via exchange properties of combinatorial structures. In51st Annual Symposium on Foundations of Computer Science, pages 575–584. IEEE, 2010. 4, 9, 11, 32, 33

  14. [14]

    Yixin Chen, Tonmoy Dey, and Alan Kuhnle. Scalable distributed algorithms for size-constrained submodular maximization in the mapreduce and adaptive complexity models.Journal of Artificial Intelligence Research, 80:1575–1622, 2024. 4

  15. [15]

    Solving linear programs in the current matrix multiplication time.Journal of the ACM, 68(1):1–39, 2021

    Michael B Cohen, Yin Tat Lee, and Zhao Song. Solving linear programs in the current matrix multiplication time.Journal of the ACM, 68(1):1–39, 2021. 7, 18

  16. [16]

    Selection via proxy: Efficient data selection for deep learning

    Cody Coleman, Christopher Yeh, Stephen Mussmann, Baharan Mirzasoleiman, Peter Bailis, Percy Liang, Jure Leskovec, and Matei Zaharia. Selection via proxy: Efficient data selection for deep learning. InInternational Conference on Learning Representations, 2020. 3

  17. [17]

    Procurement auctions via approximately optimal submodular optimization

    Yuan Deng, Amin Karbasi, Vahab Mirrokni, Renato Paes Leme, Grigoris Velegkas, and Song Zuo. Procurement auctions via approximately optimal submodular optimization. InProceedings of the 42nd International Conference on Machine Learning, volume 267, pages 13152–13180. PMLR, 2025. 5

  18. [18]

    Deep submodular functions: Definitions and learning

    Brian W Dolhansky and Jeff A Bilmes. Deep submodular functions: Definitions and learning. Advances in Neural Information Processing Systems, 29, 2016. 5

  19. [19]

    Alina Ene and Huy L. Nguyen. Submodular maximization with nearly-optimal approximation and adaptivity in nearly-linear time. InProceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 274–282. SIAM, 2019. 4

  20. [20]

    Submodular maximization with nearly optimal approximation, adaptivity and query complexity

    Matthew Fahrbach, Vahab Mirrokni, and Morteza Zadimoghaddam. Submodular maximization with nearly optimal approximation, adaptivity and query complexity. InProceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 255–273. SIAM, 2019. 4

  21. [21]

    GIST: Greedy independent set thresholding for max-min diversification with submodular utility.Advances in Neural Information Processing Systems,

    Matthew Fahrbach, Srikumar Ramalingam, Morteza Zadimoghaddam, Sara Ahmadian, Gui Citovsky, and Giulia DeSalvo. GIST: Greedy independent set thresholding for max-min diversification with submodular utility.Advances in Neural Information Processing Systems,

  22. [22]

    A threshold oflnn for approximating set cover.Journal of the ACM, 45(4):634–652,

    Uriel Feige. A threshold oflnn for approximating set cover.Journal of the ACM, 45(4):634–652,

  23. [23]

    Selection of representative protein data sets.Protein Science, 1(3):409–417, 1992

    Uwe Hobohm, Michael Scharf, Reinhard Schneider, and Chris Sander. Selection of representative protein data sets.Protein Science, 1(3):409–417, 1992. 3

  24. [24]

    The SCIP optimization suite 10.0.arXiv preprint arXiv:2511.18580, 2025

    Christopher Hojny, Mathieu Besançon, Ksenia Bestuzheva, Sander Borst, João Dionísio, Jo- hannes Ehls, Leon Eifler, Mohammed Ghannam, Ambros Gleixner, Adrian Göß, et al. The SCIP optimization suite 10.0.arXiv preprint arXiv:2511.18580, 2025. 13, 44 15

  25. [25]

    Parallelizing the dual revised simplex method.Mathematical Programming Computation, 10(1):119–142, 2018

    Qi Huangfu and Julian Hall. Parallelizing the dual revised simplex method.Mathematical Programming Computation, 10(1):119–142, 2018. 13, 44

  26. [26]

    Geometric algorithms for neural combinatorial optimization with constraints

    Nikolaos Karalias, Akbar Rafiey, Yifei Xu, Zhishang Luo, Behrooz Tahmasebi, Connie Jiang, and Stefanie Jegelka. Geometric algorithms for neural combinatorial optimization with constraints. Advances in Neural Information Processing Systems, 2025. 4, 9, 11, 31, 33

  27. [27]

    Stochastic submodular maximization: The case of coverage functions.Advances in Neural Information Processing Systems, 30, 2017

    Mohammad Karimi, Mario Lucic, Hamed Hassani, and Andreas Krause. Stochastic submodular maximization: The case of coverage functions.Advances in Neural Information Processing Systems, 30, 2017. 5

  28. [28]

    Submodularity for data selection in machine translation

    Katrin Kirchhoff and Jeff Bilmes. Submodularity for data selection in machine translation. InProceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pages 131–141, 2014. 3, 5

  29. [29]

    SNAP Datasets: Stanford large network dataset collection

    Jure Leskovec and Andrej Krevl. SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data, June 2014. 13, 44

  30. [30]

    Choosing non-redundant representative subsets of protein sequence data sets using submodular optimization.Proteins: Structure, Function, and Bioinformatics, 86(4):454–466, 2018

    Maxwell W Libbrecht, Jeffrey A Bilmes, and William Stafford Noble. Choosing non-redundant representative subsets of protein sequence data sets using submodular optimization.Proteins: Structure, Function, and Bioinformatics, 86(4):454–466, 2018. 3

  31. [31]

    Submodular optimization in the MapReduce model

    Paul Liu and Jan Vondrák. Submodular optimization in the MapReduce model. In2nd Symposium on Simplicity in Algorithms, pages 18:1–18:10. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2019. 4

  32. [32]

    Efficient submodular maximization for sums of concave over modular functions

    Yang Lv, Guihao Wang, Dachuan Xu, and Ruiqi Yang. Efficient submodular maximization for sums of concave over modular functions. InThe Fourteenth International Conference on Learning Representations, 2026. 5

  33. [33]

    Randomized composable core-sets for distributed submodular maximization

    Vahab Mirrokni and Morteza Zadimoghaddam. Randomized composable core-sets for distributed submodular maximization. InProceedings of the Forty-Seventh Annual ACM Symposium on Theory of Computing, pages 153–162, 2015. 4

  34. [34]

    Distributed submod- ular maximization: Identifying representative elements in massive data.Advances in Neural Information Processing Systems, 26, 2013

    Baharan Mirzasoleiman, Amin Karbasi, Rik Sarkar, and Andreas Krause. Distributed submod- ular maximization: Identifying representative elements in massive data.Advances in Neural Information Processing Systems, 26, 2013. 4

  35. [35]

    Nemhauser, Laurence A

    George L. Nemhauser, Laurence A. Wolsey, and Marshall L. Fisher. An analysis of approxima- tions for maximizing submodular set functions—I.Mathematical Programming, 14:265–294,

  36. [36]

    Springer, 2018

    Yurii Nesterov.Lectures on Convex Optimization, volume 137. Springer, 2018. 20, 22

  37. [37]

    OR-Tools

    Laurent Perron and Vincent Furnon. OR-Tools. URL https://developers.google.com/ optimization/. 13, 44

  38. [38]

    SMART: Submodular data mixture strategy for instruction tuning

    Kowndinya Renduchintala, Sumit Bhatia, and Ganesh Ramakrishnan. SMART: Submodular data mixture strategy for instruction tuning. InFindings of the Association for Computational Linguistics, pages 12916–12934, 2024. 3 16

  39. [39]

    How to train data-efficient llms.arXiv preprint arXiv:2402.09668, 2024

    Noveen Sachdeva, Benjamin Coleman, Wang-Cheng Kang, Jianmo Ni, Lichan Hong, Ed H Chi, James Caverlee, Julian McAuley, and Derek Zhiyuan Cheng. How to train data-efficient LLMs. arXiv preprint arXiv:2402.09668, 2024. 3

  40. [40]

    Active learning for convolutional neural networks: A core-set approach

    Ozan Sener and Silvio Savarese. Active learning for convolutional neural networks: A core-set approach. InProceedings of the 6th International Conference on Learning Representations,

  41. [41]

    Efficient minimization of decomposable submodular functions

    Peter Stobbe and Andreas Krause. Efficient minimization of decomposable submodular functions. Advances in Neural Information Processing Systems, 23, 2010. 4

  42. [42]

    MixMin: Finding data mixtures via convex minimization

    Anvith Thudi, Evianne Rovers, Yangjun Ruan, Tristan Thrush, and Chris J Maddison. MixMin: Finding data mixtures via convex minimization. InInternational Conference on Machine Learning. PMLR, 2025. 3

  43. [43]

    A deterministic linear program solver in current matrix multiplication time

    Jan van den Brand. A deterministic linear program solver in current matrix multiplication time. InProceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 259–278. SIAM, 2020. 7, 18

  44. [44]

    Projection onto the capped simplex.arXiv preprint arXiv:1503.01002, 2015

    Weiran Wang and Canyi Lu. Projection onto the capped simplex.arXiv preprint arXiv:1503.01002, 2015. 25

  45. [45]

    Unsupervised submodular subset selection for speech data

    Kai Wei, Yuzong Liu, Katrin Kirchhoff, and Jeff Bilmes. Unsupervised submodular subset selection for speech data. InInternational Conference on Acoustics, Speech and Signal Processing, pages 4107–4111. IEEE, 2014. 5

  46. [46]

    Submodularity in data subset selection and active learning

    Kai Wei, Rishabh Iyer, and Jeff Bilmes. Submodularity in data subset selection and active learning. InInternational Conference on Machine Learning, pages 1954–1963. PMLR, 2015. 3

  47. [47]

    residual

    Ofer Yehuda, Avihu Dekel, Guy Hacohen, and Daphna Weinshall. Active learning through a covering lens.Advances in Neural Information Processing Systems, 35:22354–22367, 2022. 3 17 A Analysis for Section 1 A.1 Poisson concavity ratio lower bound Lemma A.1.For any concave nondecreasing functionφ:Z ≥0 →Rsuch thatφ(0) = 0, αφ ≥1− 1 e . Proof. By Barman et al.[...