Deep learning models trained on large-scale data have achieved encouraging performance in many real-world tasks. Meanwhile, publishing those models trained on sensitive datasets, such as medical records, could pose serious privacy concerns. To counter these issues, one of the current state-of-the-art approaches is the Private Aggregation of Teacher Ensembles, or PATE, which achieved promising results in preserving the utility of the model while providing a strong privacy guarantee. PATE combines an ensemble of "teacher models" trained on sensitive data and transfers the knowledge to a "student" model through the noisy aggregation of teachers' votes for labeling unlabeled public data which the student model will be trained on. However, the knowledge or voted labels learned by the student are noisy due to private aggregation. Learning directly from noisy labels can significantly impact the accuracy of the student model. In this paper, we propose the PATE++ mechanism, which combines the current advanced noisy label training mechanisms with the original PATE framework to enhance its accuracy. A novel structure of Generative Adversarial Nets (GANs) is developed in order to integrate them effectively. In addition, we develop a novel noisy label detection mechanism for semi-supervised model training to further improve student model performance when training with noisy labels. We evaluate our method on Fashion-MNIST and SVHN to show the improvements on the original PATE on all measures.
翻译:在大规模数据方面受过培训的深层学习模型在许多现实世界任务中取得了令人鼓舞的成绩。与此同时,出版那些在敏感数据集(如医疗记录)方面受过培训的模型,可能会引起严重的隐私问题。为了解决这些问题,目前最先进的方法之一是教师集体私人聚合,即PATE,它在维护模型的实用性方面取得了可喜成果,同时提供了强有力的隐私保障。PATE将一组受过敏感数据培训的“教师模型”结合起来,并将知识传递到“学生”模型中,通过对学生模式所要培训的贴标签的教师选票进行吵闹的汇总,将知识传递到“学生”模型中。然而,学生所学的知识或投票标签由于私人聚合而变得吵闹。从吵闹的标签中直接学习可以极大地影响学生模式的准确性。在本文中,我们建议PATE++机制将当前先进的热度标签培训机制与最初的PATIFTI框架结合起来,以提高其准确性。此外,我们开发了General Aversarial Nets的新结构结构, 是要将学生培训模型与测试方法结合起来。我们开发了一种新式的升级方法,以展示SHERAVI的模型,以便展示所有的升级的升级的升级的升级的模型。我们在测试方法。我们用来展示了所有测试方法。