arxiv: 2605.10015 · v1 · submitted 2026-05-11 · 📊 stat.ML · cs.CR· cs.LG

Recognition: 2 theorem links

· Lean Theorem

Differentially Private Sampling from Distributions via Wasserstein Projection

Satoshi Hasegawa, Seng Pei Liew, Shokichi Takakura

Pith reviewed 2026-05-12 03:34 UTC · model grok-4.3

classification 📊 stat.ML cs.CRcs.LG

keywords differential privacysamplingWasserstein distanceprojectionminimax optimalityoptimal transportprivacy mechanism

0 comments

The pith

Wasserstein projection yields a minimax optimal mechanism for differentially private sampling.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Prior approaches to differentially private sampling rely on measures like KL divergence that ignore geometry and fail when distributions have mismatched supports. This paper instead uses the Wasserstein distance to define utility, which accounts for the underlying space's structure. It introduces the Wasserstein Projection Mechanism as the optimal way to choose a private distribution by projecting the target one onto the privacy-feasible set. The work also supplies algorithms to approximate this projection efficiently along with guarantees on how well they converge.

Core claim

The central claim is that the Wasserstein Projection Mechanism, obtained by projecting the original distribution onto the set of distributions satisfying the differential privacy constraint in the Wasserstein metric, is minimax optimal for the sampling problem, and that this projection can be computed approximately via efficient algorithms with convergence guarantees.

What carries the argument

The Wasserstein Projection Mechanism, which finds the distribution in the privacy-constrained set that is closest to the target distribution under the Wasserstein distance.

Load-bearing premise

That the Wasserstein distance serves as an appropriate and superior measure of utility for private samples compared to density-ratio measures, particularly when geometry or support differences are relevant.

What would settle it

Finding a dataset or distribution where samples from the Wasserstein Projection Mechanism are demonstrably less useful for downstream tasks than those from a KL-based mechanism, or where the approximation algorithm diverges.

Figures

Figures reproduced from arXiv: 2605.10015 by Satoshi Hasegawa, Seng Pei Liew, Shokichi Takakura.

**Figure 4.** Figure 4: Estimated distribution of check-ins for ε = 4. The true distribution is shown in the leftmost panel, and the distributions estimated by WPM and KPM are shown in the middle and right panels, respectively. Euclidean distance as a metric. Then, we evaluate the performance by the Wasserstein distance (p = 1) between the true distribution of check-ins and the estimated distribution from the released locations. … view at source ↗

read the original abstract

In this paper, we study the problem of sampling from a distribution under the constraint of differential privacy (DP). Prior works measure the utility of DP sampling with density ratio-based measures such as KL divergence. However, such formulations suffer from two key limitations: 1) they fail to capture the geometric structure of the support, and 2) they are not applicable when the supports of the distributions differ. To deal with these issues, we develop a novel framework for DP sampling with Wasserstein distance as the utility measure. In this formulation, we propose Wasserstein Projection Mechanism (WPM), a minimax optimal mechanism based on Wasserstein projection. Furthermore, we develop efficient algorithms for computing the proposed mechanisms approximately and provide convergence guarantees.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a Wasserstein-projection route to DP sampling that fixes the geometry and support-mismatch problems of KL utilities, with a claimed minimax-optimal mechanism and approximate algorithms.

read the letter

The main advance is swapping density-ratio utilities for Wasserstein distance in the DP sampling problem. This directly tackles two limits of prior work: KL and similar measures ignore geometry on the support, and they break when the target and output distributions have different supports. The authors define the Wasserstein Projection Mechanism as the minimax-optimal map under this new utility, then supply approximation algorithms plus convergence guarantees. That framing is new and the argument structure looks internally consistent—no obvious circularity or post-hoc fitting of the same data into the optimality claim.

Referee Report

0 major / 3 minor

Summary. The paper develops a framework for differentially private sampling from distributions that uses Wasserstein distance as the utility measure instead of density-ratio measures such as KL divergence. It introduces the Wasserstein Projection Mechanism (WPM) as a minimax-optimal mechanism obtained by projecting onto the set of DP-feasible distributions, develops efficient approximation algorithms for computing the mechanism, and supplies convergence guarantees for those algorithms. The approach is motivated by the inability of prior KL-based methods to capture geometric structure or to handle mismatched supports.

Significance. If the minimax optimality and convergence results hold, the work supplies a geometrically aware DP sampling primitive that directly addresses two well-known limitations of existing mechanisms. The explicit construction of tractable algorithms together with convergence guarantees is a concrete strength that improves the prospects for practical deployment in settings where Wasserstein geometry is natural (e.g., generative modeling or distribution learning under privacy constraints).

minor comments (3)

[§3.2] §3.2, Definition 3.1: the precise statement of the Wasserstein projection optimization problem would benefit from an explicit display of the feasible set and the objective; the current prose description leaves the exact program implicit.
[Theorem 4.3] Theorem 4.3: the convergence guarantee is stated in terms of iteration count, but the dependence on ambient dimension and on the Lipschitz constant of the cost function is not made explicit; this information is needed to assess scalability.
[§6] §6 (Experiments): the reported results are confined to low-dimensional synthetic distributions; a brief discussion of how the observed error scales with dimension or with the privacy parameter ε would strengthen the empirical section.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive assessment of our manuscript and for recommending minor revision. No specific major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper introduces a Wasserstein-based utility for DP sampling to address limitations of KL divergence, proposes the WPM as a minimax-optimal mechanism via projection onto the DP-feasible set, and derives approximate algorithms with convergence guarantees. No load-bearing step reduces by construction to a fitted parameter, self-definition, or self-citation chain; the framework is built from standard DP and optimal transport primitives without internal tautology. The central optimality and algorithmic claims remain independent of the inputs they are applied to.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

Based solely on the abstract, the central claim rests on standard differential privacy axioms and the definition of Wasserstein distance; no free parameters or invented entities beyond the named mechanism are visible.

axioms (2)

domain assumption Standard definition of differential privacy (epsilon-DP)
The entire formulation assumes the usual DP constraint on the sampling mechanism.
standard math Wasserstein distance is a valid metric on probability measures
Invoked as the utility measure without further justification in the abstract.

invented entities (1)

Wasserstein Projection Mechanism (WPM) no independent evidence
purpose: Minimax-optimal DP sampling mechanism using Wasserstein projection
Newly proposed in the paper as the core contribution.

pith-pipeline@v0.9.0 · 5422 in / 1289 out tokens · 47624 ms · 2026-05-12T03:34:41.741722+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
Wasserstein Projection Mechanism (WPM) ... arg min_{ν∈Q_{m,ε}} W_p(ν, μ) ... minimax optimal ... entropic regularization and alternating projections ... linear convergence guarantees
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear
LDP polytope Q_{m,ε} ... base measure m ... convex optimization reformulation f(m) = max_i ϕ_i(m)

Reference graph

Works this paper leans on

12 extracted references · 12 canonical work pages · 2 internal anchors

[1]

Constructing elastic distinguishability metrics for location privacy.arXiv preprint arXiv:1503.00756,

Konstantinos Chatzikokolakis, Catuscia Palamidessi, and Marco Stronati. Constructing elastic distinguishability metrics for location privacy.arXiv preprint arXiv:1503.00756,

work page arXiv
[2]

Rappor: Randomized aggregatable privacy- preserving ordinal response

Úlfar Erlingsson, Vasyl Pihur, and Aleksandra Korolova. Rappor: Randomized aggregatable privacy- preserving ordinal response. InProceedings of the 2014 ACM SIGSAC conference on computer and communications security, pages 1054–1067,

work page 2014
[3]

Locally optimal private sampling: Beyond the global minimax.arXiv preprint arXiv:2510.09485,

Hrad Ghoukasian, Bonwoo Lee, and Shahab Asoodeh. Locally optimal private sampling: Beyond the global minimax.arXiv preprint arXiv:2510.09485,

work page arXiv
[4]

Differentially private wasserstein barycenters.arXiv preprint arXiv:2510.03021,

Anming Gu, Sasidhar Kunapuli, Mark Bun, Edward Chien, and Kristjan Greenewald. Differentially private wasserstein barycenters.arXiv preprint arXiv:2510.03021,

work page arXiv
[5]

Efficient Estimation of Word Representations in Vector Space

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representa- tions in vector space.arXiv preprint arXiv:1301.3781,

work page internal anchor Pith review Pith/arXiv arXiv
[6]

Learning with differentially private (sliced) wasserstein gradients.arXiv preprint arXiv:2502.01701,

David Rodríguez-Vítores, Clément Lalanne, and Jean-Michel Loubes. Learning with differentially private (sliced) wasserstein gradients.arXiv preprint arXiv:2502.01701,

work page arXiv
[7]

InvisibleInk: High-Utility and Low-Cost Text Generation with Differential Privacy

Vishnu Vinod, Krishna Pillutla, and Abhradeep Guha Thakurta. Invisibleink: High-utility and low-cost text generation with differential privacy.arXiv preprint arXiv:2507.02974,

work page internal anchor Pith review Pith/arXiv arXiv
[8]

Then f( ¯mT )−f(m ∗)≤ βDp α r 2(1 + logk) T =O β α Dp r logk T !

Choose a constant stepsize ηt ≡η:= p 2(1 + logk) βDp √ T . Then f( ¯mT )−f(m ∗)≤ βDp α r 2(1 + logk) T =O β α Dp r logk T ! . Proof. By construction, g(t) ∈∂f(m (t)) for every t. By the regret bound for the unnormalized exponentiated gradient algorithm [Shalev-Shwartz, 2025, Theorem 2.23], for anym∈ M, TX t=1 ⟨g(t), m(t) −m⟩ ≤ DgKL(m∥m(1)) η + η 2 TX t=1 ...

work page 2025
[9]

Moreover, the mapp7→α ∗(p)is decreasing

d−2 2 , C d = Γ((d+ 1)/2)√πΓ(d/2) , gp(u) = (2−2u) p/2, S d(t) = Z 1 t fd(u) du, Ud,p(t) = Z 1 t gp(u)fd(u) du, H d,p = Z 1 −1 gp(u)fd(u) du, andt ∗ ∈(−1,1)is the unique solution of eε/2 −e −ε/2 Ud,p(t∗) +e −ε/2Hd,p =g p(t∗) e−ε/2 + eε/2 −e −ε/2 Sd(t∗) . Moreover, the mapp7→α ∗(p)is decreasing. The optimal mechanism Mm∗,ε can be interpreted as a generaliz...

work page 2022
[10]

Define Mp(t) := Z 1 −1 r(u)p µt(du) 1/p , F p(t) :=M p(t)−r(t)

Its defining equation is equivalent to Z 1 −1 gp(u)µ t∗(p)(du) =g p(t∗(p)), equivalently, Z 1 −1 r(u)p µt∗(p)(du) 1/p =r t∗(p) . Define Mp(t) := Z 1 −1 r(u)p µt(du) 1/p , F p(t) :=M p(t)−r(t). Thent ∗(p)is the unique zero ofF p. Fix t∈(−1,1) . Since c± >0 , fd(u)>0 for u∈(−1,1) , and Z(t)>0 , µt has positive density on (−1,1). Asris positive and nonconsta...

work page 1993
[11]

Taking the minimum overπ∈Π(ν, µ)yields, for everyν, OT(ν, µ)−λlogN≤OT λ(ν, µ)≤OT(ν, µ).(3) Let ν∗ ∈arg min ν∈Q Wp(ν, µ), ν λ :=M λ m,ε[µ]

Therefore, X i,j Cijπij −λlogN≤ X i,j Cijπij +λ X i,j πij logπ ij ≤ X i,j Cijπij. Taking the minimum overπ∈Π(ν, µ)yields, for everyν, OT(ν, µ)−λlogN≤OT λ(ν, µ)≤OT(ν, µ).(3) Let ν∗ ∈arg min ν∈Q Wp(ν, µ), ν λ :=M λ m,ε[µ]. Sincex7→x 1/p is strictly increasing onR +, we also have ν∗ ∈arg min ν∈Q OT(ν, µ). By definition ofM λ m,ε[µ], νλ ∈arg min ν∈Q OTλ(ν, µ)...

work page 1957
[12]

We observe that WPM captures the true distribution more accurately than KPM, which demonstrates the advantage of WPM in preserving the geometric structure of the data. N Experimental Details DatasetsMovielens-100K [Harper and Konstan, 2015] contains 100,000 ratings from 1000 users on 1700 movies, which is distributed under a usage license that permits res...

work page 2015