pith. sign in

arxiv: 1802.08908 · v1 · pith:SF2ZZQLHnew · submitted 2018-02-24 · 📊 stat.ML · cs.CR· cs.LG

Scalable Private Learning with PATE

classification 📊 stat.ML cs.CRcs.LG
keywords privacylearningpatetasksteacheraggregationconsensusdata
0
0 comments X
read the original abstract

The rapid adoption of machine learning has increased concerns about the privacy implications of machine learning models trained on sensitive data, such as medical records or other personal information. To address those concerns, one promising approach is Private Aggregation of Teacher Ensembles, or PATE, which transfers to a "student" model the knowledge of an ensemble of "teacher" models, with intuitive privacy provided by training teachers on disjoint data and strong privacy guaranteed by noisy aggregation of teachers' answers. However, PATE has so far been evaluated only on simple classification tasks like MNIST, leaving unclear its utility when applied to larger-scale learning tasks and real-world datasets. In this work, we show how PATE can scale to learning tasks with large numbers of output classes and uncurated, imbalanced training data with errors. For this, we introduce new noisy aggregation mechanisms for teacher ensembles that are more selective and add less noise, and prove their tighter differential-privacy guarantees. Our new mechanisms build on two insights: the chance of teacher consensus is increased by using more concentrated noise and, lacking consensus, no answer need be given to a student. The consensus answers used are more likely to be correct, offer better intuitive privacy, and incur lower-differential privacy cost. Our evaluation shows our mechanisms improve on the original PATE on all measures, and scale to larger tasks with both high utility and very strong privacy ($\varepsilon$ < 1.0).

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. DPrivBench: Benchmarking LLMs' Reasoning for Differential Privacy

    cs.LG 2026-04 unverdicted novelty 7.0

    DPrivBench shows that top LLMs handle basic differential privacy mechanisms but fail on advanced algorithms, exposing gaps in automated DP reasoning.

  2. DPrivBench: Benchmarking LLMs' Reasoning for Differential Privacy

    cs.LG 2026-04 accept novelty 7.0

    DPrivBench is a new benchmark for evaluating LLMs on differential privacy reasoning, with results showing good performance on textbook mechanisms but substantial failures on advanced algorithms.

  3. SynBench: A Benchmark for Differentially Private Text Generation

    cs.AI 2025-09 conditional novelty 7.0

    SynBench benchmarks DP text generators across nine datasets and uses a new MIA to show that public pre-training on portions of private data overestimates synthetic text quality and breaks DP privacy bounds.