Knowledge distillation has emerged as a scalable and effective way for privacy-preserving machine learning. One remaining drawback is that it consumes privacy in a model-level (i.e., client-level) manner, every distillation query incurs privacy loss of one client's all records. In order to attain fine-grained privacy accountant and improve utility, this work proposes a model-free reverse $k$-NN labeling method towards record-level private knowledge distillation, where each record is employed for labeling at most $k$ queries. Theoretically, we provide bounds of labeling error rate under the centralized/local/shuffle model of differential privacy (w.r.t. the number of records per query, privacy budgets). Experimentally, we demonstrate that it achieves new state-of-the-art accuracy with one order of magnitude lower of privacy loss. Specifically, on the CIFAR-$10$ dataset, it reaches $82.1\%$ test accuracy with centralized privacy budget $1.0$; on the MNIST/SVHN dataset, it reaches $99.1\%$/$95.6\%$ accuracy respectively with budget $0.1$. It is the first time deep learning with differential privacy achieve comparable accuracy with reasonable data privacy protection (i.e., $\exp(\epsilon)\leq 1.5$). Our code is available at https://github.com/liyuntong9/rknn.
翻译:知识蒸馏已成为保存隐私的机器学习的一种可缩放和有效的方法。 还有一个缺点是,它以模型(即客户级别)的方式消耗隐私,每个蒸馏查询都会造成一个客户所有记录的隐私损失。 为了获得精细的隐私会计,改善效用,这项工作建议采用一个无模型的反向美元-NN标签方法,用于记录级私人知识蒸馏,每份记录都用于贴上最多1美元查询的标签。理论上,我们提供以中央/地方/地方(即客户级别)为基础的不同隐私模式下标签错误率的界限(w.r.t.每个查询、隐私预算的记录数量)。实验性地说,我们证明,它实现了新的艺术准确性,隐私损失的幅度要低一个顺序。具体地说,在CFAR-10美元数据集上,每份记录都用最高10美元来标注。在MNIST/SVHN数据集上,每份高达99.1美元/95美元/美元/美元/美元/美元的保密性价标定值。 它在预算上分别实现了可比较的准确性数据,它具有可比较的精确性。 它在预算上的精确度上是可以比较的精确性的数据。