Training high-performing deep learning models require a rich amount of data which is usually distributed among multiple data sources in practice. Simply centralizing these multi-sourced data for training would raise critical security and privacy concerns, and might be prohibited given the increasingly strict data regulations. To resolve the tension between privacy and data utilization in distributed learning, a machine learning framework called private aggregation of teacher ensembles(PATE) has been recently proposed. PATE harnesses the knowledge (label predictions for an unlabeled dataset) from distributed teacher models to train a student model, obviating access to distributed datasets. Despite being enticing, PATE does not offer protection for the individual label predictions from teacher models, which still entails privacy risks. In this paper, we propose SEDML, a new protocol which allows to securely and efficiently harness the distributed knowledge in machine learning. SEDML builds on lightweight cryptography and provides strong protection for the individual label predictions, as well as differential privacy guarantees on the aggregation results. Extensive evaluations show that while providing privacy protection, SEDML preserves the accuracy as in the plaintext baseline. Meanwhile, SEDML's performance in computing and communication is 43 times and 1.23 times higher than the latest technology, respectively.
翻译:为了解决在分布式学习中隐私与数据利用之间的紧张关系,最近提议了一个称为教师组合私人集合(PATE)的机械学习框架;PATE利用分布式教师模型的知识(未贴标签数据集的标签预测)来培训学生模型,避免访问分布式数据集。 广泛的评价显示,尽管提供隐私保护,但SEDML维护了直接基线的准确性,SEDML 也分别是最新的43次和43次。