Learning Privately over Distributed Features: An ADMM Sharing Approach
Pith reviewed 2026-05-24 20:20 UTC · model grok-4.3
The pith
An ADMM sharing framework solves risk minimization over distributed features by exchanging one scalar per sample.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The proposed ADMM sharing framework approaches risk minimization over distributed features where each party shares only a single value for each sample, with convergence and iteration complexity results under non-convex loss, and a differentially private variant with bounded privacy guarantee.
What carries the argument
ADMM sharing framework that minimizes data leakage by requiring parties to share only one scalar per sample instead of raw features or model parameters.
If this is right
- Convergence and iteration complexity are established for the parallel ADMM algorithm under non-convex loss.
- A differentially private version is introduced with privacy bounds from noise perturbation.
- The algorithms converge efficiently and robustly, with advantages over gradient-based methods on high-dimensional data.
Where Pith is reading between the lines
- The single-value sharing could reduce communication costs in addition to privacy risks.
- This framework might be adapted to other distributed optimization problems with similar privacy constraints.
- Testing on real-world vertical federated learning datasets would validate the practical privacy-utility balance.
Load-bearing premise
The optimization can be correctly performed and privacy preserved when each party shares only a single scalar value per sample without exchanging raw features or full model parameters.
What would settle it
Demonstrating that sharing the single scalar allows inference of private feature information or that the algorithm diverges on non-convex losses would falsify the approach.
read the original abstract
Distributed machine learning has been widely studied in order to handle exploding amount of data. In this paper, we study an important yet less visited distributed learning problem where features are inherently distributed or vertically partitioned among multiple parties, and sharing of raw data or model parameters among parties is prohibited due to privacy concerns. We propose an ADMM sharing framework to approach risk minimization over distributed features, where each party only needs to share a single value for each sample in the training process, thus minimizing the data leakage risk. We establish convergence and iteration complexity results for the proposed parallel ADMM algorithm under non-convex loss. We further introduce a novel differentially private ADMM sharing algorithm and bound the privacy guarantee with carefully designed noise perturbation. The experiments based on a prototype system shows that the proposed ADMM algorithms converge efficiently in a robust fashion, demonstrating advantage over gradient based methods especially for data set with high dimensional feature spaces.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes an ADMM sharing framework for empirical risk minimization when features are vertically partitioned across parties that cannot share raw data or model parameters. Each party communicates only a single scalar per sample per iteration. The authors claim convergence and iteration-complexity guarantees for the parallel ADMM algorithm even under non-convex losses, introduce a differentially private variant with explicit privacy bounds obtained via carefully calibrated noise, and report experiments showing faster and more robust convergence than gradient-based alternatives, especially in high-dimensional feature spaces.
Significance. If the non-convex convergence result and the privacy composition hold under the stated assumptions, the work supplies a communication-efficient primitive for vertical federated learning that strictly limits leakage to one scalar per sample. The explicit iteration-complexity bound for non-convex objectives and the prototype-system experiments constitute concrete, verifiable contributions.
major comments (2)
- [§4.2, Theorem 2] §4.2, Theorem 2 (non-convex convergence): the proof invokes a uniform bound on the augmented Lagrangian that is stated to hold when each local subproblem is solved to sufficient accuracy, yet the manuscript provides neither an explicit accuracy requirement nor a verifiable condition on the local solvers that would guarantee this bound when features are strictly partitioned.
- [§5.3] §5.3, privacy analysis: the composition of the Gaussian mechanism across iterations is bounded using a fixed noise variance, but the analysis does not account for the dependence of the sensitivity on the dual variable updates; a concrete sensitivity calculation or a reference to the precise lemma establishing the sensitivity is missing.
minor comments (2)
- Notation: the symbol z_i is used both for the auxiliary variable in the ADMM formulation and for the noisy version in the DP section; a clarifying subscript or redefinition would remove ambiguity.
- Experiments: Table 2 reports wall-clock time but does not state the number of communication rounds or the per-round communication volume; adding these quantities would make the claimed advantage over gradient methods directly comparable.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and positive assessment of the work's significance for vertical federated learning. We address each major comment below with clarifications and commit to revisions where the manuscript is incomplete.
read point-by-point responses
-
Referee: [§4.2, Theorem 2] §4.2, Theorem 2 (non-convex convergence): the proof invokes a uniform bound on the augmented Lagrangian that is stated to hold when each local subproblem is solved to sufficient accuracy, yet the manuscript provides neither an explicit accuracy requirement nor a verifiable condition on the local solvers that would guarantee this bound when features are strictly partitioned.
Authors: We agree that the current presentation of Theorem 2 leaves the local solver accuracy implicit. The proof relies on the standard ADMM assumption that each party's local subproblem (which depends only on its private features) is solved to a precision sufficient to keep the augmented Lagrangian bounded; because the features are vertically partitioned, this local solve is always feasible without cross-party data sharing. In the revision we will add an explicit accuracy requirement (e.g., each local objective minimized to within ε_t with ∑ε_t < ∞) together with a short lemma showing that this condition is verifiable from local quantities alone and suffices to preserve the uniform bound used in the non-convex convergence argument. revision: yes
-
Referee: [§5.3] §5.3, privacy analysis: the composition of the Gaussian mechanism across iterations is bounded using a fixed noise variance, but the analysis does not account for the dependence of the sensitivity on the dual variable updates; a concrete sensitivity calculation or a reference to the precise lemma establishing the sensitivity is missing.
Authors: The privacy section calibrates noise to a worst-case sensitivity derived from the Lipschitz constant of the loss and the bounded dual updates; however, we acknowledge that an explicit sensitivity derivation accounting for the dual-variable dependence is not written out. In the revised manuscript we will insert a short lemma (or cite the standard result from the Gaussian mechanism literature) that bounds the sensitivity of the communicated scalar independently of the dual variables under the problem assumptions, thereby justifying the fixed noise variance and the subsequent composition bound. revision: yes
Circularity Check
No significant circularity; derivation is self-contained
full rationale
The paper introduces a novel ADMM sharing formulation for vertically partitioned features, derives convergence and iteration complexity bounds for non-convex losses via standard ADMM analysis, and composes differential privacy via calibrated noise on the shared scalars. These results follow directly from the algorithm's update rules and perturbation mechanism without any reduction of outputs to fitted parameters, self-referential definitions, or load-bearing self-citations. The central claims rest on independent mathematical arguments applied to the proposed updates rather than on renaming or smuggling prior results by the same authors.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption ADMM converges for the non-convex loss functions considered in the distributed feature setting
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.