The Geometry of Linear Program Compression: An Exact Characterization and Learning Algorithm

Omar Bennouna; Yuhan Ye

arxiv: 2605.25635 · v1 · pith:MVTI7FCWnew · submitted 2026-05-25 · 🧮 math.OC

The Geometry of Linear Program Compression: An Exact Characterization and Learning Algorithm

Yuhan Ye , Omar Bennouna This is my paper

Pith reviewed 2026-06-29 20:51 UTC · model grok-4.3

classification 🧮 math.OC

keywords linear programmingLP compressiongeometric characterizationdecision-relevant subspacesample-based learningfast-rate guaranteesexact optimality preservation

0 comments

The pith

Linear programs compress exactly into lower-dimensional equivalents that preserve optimality when objectives vary in a low-dimensional decision-relevant subspace.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that an LP can be compressed into a smaller version that yields the identical optimal solution for any new objective drawn from the same family, provided the objectives lie in a low-dimensional decision-relevant subspace. It supplies both a geometric characterization of when this exact equivalence holds and a sample-based algorithm to discover the subspace and the compression from historical data. A sympathetic reader would care because the approach supports repeated solves that run faster yet incur no loss in solution quality, unlike prior methods that only approximate the objective value. The learning procedure achieves a fast convergence rate of order d star over n rather than the slower uniform-convergence rates of approximate techniques.

Core claim

There exists an exact geometric characterization of compressed LPs that are equivalent in optimality to the original for objectives in a given distribution, together with a tractable sample-based algorithm that recovers the decision-relevant subspace and yields the true optimum on an unseen instance with probability at least 1 minus O tilde of d star over n.

What carries the argument

The decision-relevant subspace, the lowest-dimensional linear space containing all objective-function variation that affects the optimal solution, which permits an exact projection of the feasible set while leaving the optimum unchanged.

If this is right

The dimension of the compressed LP can be chosen to trade off computational speed against the probability of exact recovery on future instances.
The 1 over n dependence on sample size is strictly faster than the 1 over square root n rates of earlier approximate projection methods.
Exact preservation of the optimal solution is achieved rather than merely bounding the objective value.
Repeated LP solves become faster with no degradation in solution quality once the subspace is learned.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same geometric approach might extend to other convex problems if an analogous decision-relevant subspace can be defined.
A user with a fixed number of historical LPs could select the highest allowable dimension that still meets a target recovery probability.
Empirical checks on large-scale repeated optimization tasks could measure the practical gap between the derived probability bound and observed recovery rates.

Load-bearing premise

Historical LP samples are drawn i.i.d. from a distribution over objective functions that admits a low-dimensional decision-relevant subspace recoverable from the samples.

What would settle it

A sequence of new i.i.d. test LPs drawn from the same distribution on which the learned compressed LP returns a different optimal solution more often than the stated O tilde d star over n probability bound would falsify the exact characterization or the algorithm's guarantee.

Figures

Figures reproduced from arXiv: 2605.25635 by Omar Bennouna, Yuhan Ye.

**Figure 2.** Figure 2: Dimension growth of OursEstC [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗

**Figure 3.** Figure 3: Comparison across target outside-mass levels. Rand corresponds to random projections. [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗

**Figure 4.** Figure 4: Comparison against the FCNN-c baseline. 5 Conclusion and Limitations We characterized exact compression of repeatedly solved linear programs through the geometry of the decision-relevant subspace dir(X ⋆ (C)): any affine slice anchored at a reachable optimizer and whose direction subspace contains dir(X ⋆ (C)) yields an exact reformulation, and a basis of dir(X ⋆ (C)) gives the minimal d ⋆ -variable one. O… view at source ↗

**Figure 5.** Figure 5: Objective ratio as the projection-baseline dimension [PITH_FULL_IMAGE:figures/full_fig_p024_5.png] view at source ↗

**Figure 6.** Figure 6: Objective ratio as the sample size varies under natural priors. [PITH_FULL_IMAGE:figures/full_fig_p025_6.png] view at source ↗

**Figure 7.** Figure 7: Learned dimension growth of our algorithm on the original synthetic LP instances. [PITH_FULL_IMAGE:figures/full_fig_p025_7.png] view at source ↗

**Figure 8.** Figure 8: Reduced-dimension sweep with an unknown prior set. [PITH_FULL_IMAGE:figures/full_fig_p025_8.png] view at source ↗

**Figure 9.** Figure 9: Reduced-dimension sweep for the plain-random [PITH_FULL_IMAGE:figures/full_fig_p026_9.png] view at source ↗

**Figure 10.** Figure 10: Sample-efficiency study for the merged DataDrivenProj baseline, with OursEstC shown as the reference. 26 [PITH_FULL_IMAGE:figures/full_fig_p026_10.png] view at source ↗

**Figure 11.** Figure 11: Reduced-dimension sweep for the DataDrivenProj baseline, with the n = 20 OursEstC result shown as the reference. 27 [PITH_FULL_IMAGE:figures/full_fig_p027_11.png] view at source ↗

read the original abstract

We study how much a linear program (LP) can be compressed when solved repeatedly, given prior knowledge about its objective function. Existing data-driven projection methods learn low-dimensional surrogate LPs with approximate objective-value guarantees, but cannot provably identify the optimal projection for a prescribed compression budget. We instead ask a sharper question: how far can an LP be compressed into a lower-dimensional equivalent while \emph{exactly} preserving optimality, enabling faster repeated solves with no loss in solution quality? We provide an exact geometric characterization of such compressed LPs, together with a tractable sample-based learning algorithm that comes with fast-rate guarantees: the compressed LP recovers the optimal solution of an unseen instance with probability at least $1-\widetilde O(d^\star/n)$, where $d^\star$ is the dimension of the decision-relevant subspace, and $n$ is the number of available historical LP samples. This $1/n$ dependence is sharper than the $\widetilde O(1/\sqrt n)$ uniform-convergence rates of approximate projection methods. Our framework further exposes a tunable tradeoff between the dimension of the compressed LP and the probability of recovering the optimal solution, allowing the user to trade compression for accuracy.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives an exact geometric characterization of optimality-preserving LP compressions and a learning algorithm with 1/n recovery rates under a low-dimensional subspace assumption.

read the letter

The main new piece is the exact geometric characterization of compressed LPs that recover the optimal solution with no loss for instances from a distribution admitting a low-dimensional decision-relevant subspace, plus the sample-based algorithm that achieves the stated 1−O~(d*/n) probability. This is sharper than the approximate projection methods cited, which stop at O(1/sqrt n) uniform rates.

The geometric view lets them identify the relevant subspace from historical samples and then compress exactly to it, which explains the faster rate once the subspace is recovered. The tunable tradeoff between compressed dimension and recovery probability is a direct consequence and looks usable in practice.

The argument is internally consistent once the subspace assumption is granted; the stress-test found no circularity or unsupported steps in the central claim. The weakest modeling choice is that the objectives must live in such a recoverable low-dimensional structure—if the distribution is fully general the method does not apply.

The abstract is thin on proof sketches and verification details, but nothing in the stated claims contradicts the equations or assumptions given. This is aimed at researchers who solve families of similar LPs repeatedly and want dimension reduction without sacrificing exact optimality. A reader working on data-driven optimization or repeated linear programming would get concrete value from the characterization and the rate improvement.

It deserves a serious referee because the exact characterization and the 1/n guarantee are specific enough to check and potentially useful if the proofs hold.

Referee Report

0 major / 3 minor

Summary. The manuscript provides an exact geometric characterization of compressed linear programs that exactly preserve optimality for objective functions drawn from a distribution admitting a low-dimensional decision-relevant subspace. It further develops a tractable sample-based learning algorithm that recovers the optimal solution of an unseen instance with probability at least 1−Õ(d∗/n), where d∗ denotes the dimension of the decision-relevant subspace and n the number of historical samples. The framework also exposes a tunable tradeoff between the dimension of the compressed LP and the recovery probability.

Significance. If the central claims hold, the work offers a meaningful advance in data-driven optimization by replacing approximate objective-value guarantees with exact optimality preservation and by achieving a fast 1/n rate that improves on the Õ(1/√n) uniform-convergence rates of prior projection methods. The explicit geometric characterization, the sample-based algorithm, and the dimension-probability tradeoff are concrete strengths that could inform practical repeated LP solving.

minor comments (3)

[Abstract] Abstract, line beginning 'the compressed LP recovers...': the probability bound is stated without an accompanying sentence that recalls the i.i.d. sampling assumption or the recoverability of the subspace; adding one clause would improve readability for readers who encounter only the abstract.
The notation ilde O is used for the probability term but is not defined in the provided abstract or claim summary; a brief parenthetical or footnote in the introduction would remove ambiguity.
[Introduction] The abstract refers to 'historical LP samples' without specifying whether the constraint matrix is fixed or also varies; a single clarifying sentence in §1 would prevent misinterpretation.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive summary of the manuscript and for recommending minor revision. No major comments appear in the report, which we interpret as an indication that the central claims, geometric characterization, sample-based algorithm, and 1/n-rate guarantees are viewed as sound. We are prepared to incorporate any minor editorial suggestions that may arise during the revision process.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper's central contribution is an exact geometric characterization of compressed LPs that exactly preserve optimality for instances drawn from a distribution admitting a low-dimensional decision-relevant subspace, together with a sample-based learning algorithm. The stated recovery probability 1−O~(d*/n) is presented as a fast-rate guarantee derived from the subspace structure and i.i.d. sampling assumption; it does not reduce by construction to a fitted parameter or self-citation. No self-definitional steps, fitted inputs renamed as predictions, load-bearing self-citations, or ansatz smuggling appear in the abstract or claim structure. The argument is self-contained once the subspace existence and recoverability are granted as modeling assumptions.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The framework rests on the existence of a low-dimensional decision-relevant subspace and i.i.d. sampling of objectives; these are domain assumptions required for both the characterization and the stated probability bound.

axioms (2)

domain assumption Existence of a low-dimensional decision-relevant subspace for the family of objective functions
Invoked to enable exact compression while preserving optimality; central to both the geometric characterization and the learning algorithm.
domain assumption Historical LP samples are drawn i.i.d. from the distribution over objectives
Required to obtain the probability bound 1−O~(d*/n) on recovering the optimal solution for unseen instances.

pith-pipeline@v0.9.1-grok · 5737 in / 1397 out tokens · 28504 ms · 2026-06-29T20:51:16.158796+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

10 extracted references · 2 canonical work pages · 1 internal anchor

[2]

Dimitris Bertsimas and Bartolomeo Stellato

URLhttps: //arxiv.org/abs/2602.15365. Dimitris Bertsimas and Bartolomeo Stellato. The voice of optimization.Machine Learning, 110(2): 249–277,

work page arXiv
[3]

Elmachtoub, Paul Grigas, and Ambuj Tewari

Othman El Balghiti, Adam N. Elmachtoub, Paul Grigas, and Ambuj Tewari. Generalization bounds in the predict-then-optimize framework.Mathematics of Operations Research, 48(4):2043–2065,

2043
[4]

Learning Decision-Sufficient Representations for Linear Optimization

Yuhan Ye, Saurabh Amin, and Asuman E. Ozdaglar. Learning decision-sufficient representations for linear optimization.arXiv preprint arXiv:2603.18551,

work page internal anchor Pith review Pith/arXiv arXiv
[5]

Choose an extreme optimal solution of the corresponding auxiliary LP overG(c)

Hence either max x∈G(c) a⊤ j (x−x 0)>0ormin x∈G(c) a⊤ j (x−x 0)<0. Choose an extreme optimal solution of the corresponding auxiliary LP overG(c). Since G(c)is a face of the polytopeX, every extreme point ofG(c)is an extreme point ofX. The chosen point therefore belongs toG(c) ∩ X ∠ and has nonzero projection along someaj ∈S ⊥, so x−x 0 /∈S. The next three...

2021
[6]

It remains to prove the out-of-sample certificate (5). Sample-compression background.Sample compression is a classical way to prove distribution- free generalization: rather than controlling the complexity of an entire hypothesis class, one shows that the learned predictor can be reconstructed from a small subset of the training sample. This idea goes bac...

1984
[7]

sampleSof sizen, R(ρ(κ(S)))≤ 4 n 6|κ(S)|+ log e δ , where R(·)is the true0–1risk and |κ(S)| is the realized compression size

imply that, with probability at least1−δ over an i.i.d. sampleSof sizen, R(ρ(κ(S)))≤ 4 n 6|κ(S)|+ log e δ , where R(·)is the true0–1risk and |κ(S)| is the realized compression size. This fast1/n rate is specific to the realizable stable regime; in agnostic compression settings, such fast rates are not available in general (Hanneke and Kontorovich, 2019). ...

2019
[8]

Since all labels are identically zero, we suppress them in the notation; equivalently,κ stores the labeled examples(ci, 0)i∈Tn

Define the compression map κby κ(S) := (ci)i∈Tn, where the subsequence is stored in its original order. Since all labels are identically zero, we suppress them in the notation; equivalently,κ stores the labeled examples(ci, 0)i∈Tn. Define the reconstruction map ρ by rerunning Algorithm 1 on any supplied subsequenceS′ using the same fixed anchorx0 and the ...

2021
[9]

Since Lemma 10 gives at mostd⋆ total appends, the number of containment checks is at most n + d⋆

There is one final containment check for each of then samples and one additional check for each append. Since Lemma 10 gives at mostd⋆ total appends, the number of containment checks is at most n + d⋆. Each check solves only a polynomial number of LPs overX or over the optimal face of X obtained by adding the equalityc⊤ i x = v(ci), and the required ortho...

1941
[10]

Its calibrated estimated prior is an ellipsoid

computed with a ridge covariance estimate; the ridge term makes the inverse well-defined and stabilizes high-dimensional covariance estimation (Ledoit and Wolf, 2004). Its calibrated estimated prior is an ellipsoid. Proof of Proposition 13.Condition on the fitting sampleDfit, so the scoresbθ is fixed. Let S = sbθ(c) for an independent fresh drawc∼P c, and...

2004
[11]

For synthetic instances, d is the original decision dimension

21 LP instances and cost distributions.All experiments use repeated general-form LPs in inequality form, min x∈Rd {c⊤x:Ax≤b}, where inequalities originally written in the opposite direction are multiplied by−1, the feasible region X = {x : Ax≤b} ⊆R d is fixed, and only the cost vectorc∈R d varies. For synthetic instances, d is the original decision dimens...

2024

[1] [2]

Dimitris Bertsimas and Bartolomeo Stellato

URLhttps: //arxiv.org/abs/2602.15365. Dimitris Bertsimas and Bartolomeo Stellato. The voice of optimization.Machine Learning, 110(2): 249–277,

work page arXiv

[2] [3]

Elmachtoub, Paul Grigas, and Ambuj Tewari

Othman El Balghiti, Adam N. Elmachtoub, Paul Grigas, and Ambuj Tewari. Generalization bounds in the predict-then-optimize framework.Mathematics of Operations Research, 48(4):2043–2065,

2043

[3] [4]

Learning Decision-Sufficient Representations for Linear Optimization

Yuhan Ye, Saurabh Amin, and Asuman E. Ozdaglar. Learning decision-sufficient representations for linear optimization.arXiv preprint arXiv:2603.18551,

work page internal anchor Pith review Pith/arXiv arXiv

[4] [5]

Choose an extreme optimal solution of the corresponding auxiliary LP overG(c)

Hence either max x∈G(c) a⊤ j (x−x 0)>0ormin x∈G(c) a⊤ j (x−x 0)<0. Choose an extreme optimal solution of the corresponding auxiliary LP overG(c). Since G(c)is a face of the polytopeX, every extreme point ofG(c)is an extreme point ofX. The chosen point therefore belongs toG(c) ∩ X ∠ and has nonzero projection along someaj ∈S ⊥, so x−x 0 /∈S. The next three...

2021

[5] [6]

It remains to prove the out-of-sample certificate (5). Sample-compression background.Sample compression is a classical way to prove distribution- free generalization: rather than controlling the complexity of an entire hypothesis class, one shows that the learned predictor can be reconstructed from a small subset of the training sample. This idea goes bac...

1984

[6] [7]

sampleSof sizen, R(ρ(κ(S)))≤ 4 n 6|κ(S)|+ log e δ , where R(·)is the true0–1risk and |κ(S)| is the realized compression size

imply that, with probability at least1−δ over an i.i.d. sampleSof sizen, R(ρ(κ(S)))≤ 4 n 6|κ(S)|+ log e δ , where R(·)is the true0–1risk and |κ(S)| is the realized compression size. This fast1/n rate is specific to the realizable stable regime; in agnostic compression settings, such fast rates are not available in general (Hanneke and Kontorovich, 2019). ...

2019

[7] [8]

Since all labels are identically zero, we suppress them in the notation; equivalently,κ stores the labeled examples(ci, 0)i∈Tn

Define the compression map κby κ(S) := (ci)i∈Tn, where the subsequence is stored in its original order. Since all labels are identically zero, we suppress them in the notation; equivalently,κ stores the labeled examples(ci, 0)i∈Tn. Define the reconstruction map ρ by rerunning Algorithm 1 on any supplied subsequenceS′ using the same fixed anchorx0 and the ...

2021

[8] [9]

Since Lemma 10 gives at mostd⋆ total appends, the number of containment checks is at most n + d⋆

There is one final containment check for each of then samples and one additional check for each append. Since Lemma 10 gives at mostd⋆ total appends, the number of containment checks is at most n + d⋆. Each check solves only a polynomial number of LPs overX or over the optimal face of X obtained by adding the equalityc⊤ i x = v(ci), and the required ortho...

1941

[9] [10]

Its calibrated estimated prior is an ellipsoid

computed with a ridge covariance estimate; the ridge term makes the inverse well-defined and stabilizes high-dimensional covariance estimation (Ledoit and Wolf, 2004). Its calibrated estimated prior is an ellipsoid. Proof of Proposition 13.Condition on the fitting sampleDfit, so the scoresbθ is fixed. Let S = sbθ(c) for an independent fresh drawc∼P c, and...

2004

[10] [11]

For synthetic instances, d is the original decision dimension

21 LP instances and cost distributions.All experiments use repeated general-form LPs in inequality form, min x∈Rd {c⊤x:Ax≤b}, where inequalities originally written in the opposite direction are multiplied by−1, the feasible region X = {x : Ax≤b} ⊆R d is fixed, and only the cost vectorc∈R d varies. For synthetic instances, d is the original decision dimens...

2024