Deep learning often requires a large amount of data. In real-world applications, e.g., healthcare applications, the data collected by a single organization (e.g., hospital) is often limited, and the majority of massive and diverse data is often segregated across multiple organizations. As such, it motivates the researchers to conduct distributed deep learning, where the data user would like to build DL models using the data segregated across multiple different data owners. However, this could lead to severe privacy concerns due to the sensitive nature of the data, thus the data owners would be hesitant and reluctant to participate. We propose LDP-DL, a privacy-preserving distributed deep learning framework via local differential privacy and knowledge distillation, where each data owner learns a teacher model using its own (local) private dataset, and the data user learns a student model to mimic the output of the ensemble of the teacher models. In the experimental evaluation, a comprehensive comparison has been made among our proposed approach (i.e., LDP-DL), DP-SGD, PATE and DP-FL, using three popular deep learning benchmark datasets (i.e., CIFAR10, MNIST and FashionMNIST). The experimental results show that LDP-DL consistently outperforms the other competitors in terms of privacy budget and model accuracy.
翻译:深层学习往往需要大量的数据。 在现实世界的应用中,例如医疗应用中,由一个组织(例如医院)收集的数据往往有限,而且大部分大规模和多样化的数据往往在多个组织之间分离,因此,它激励研究人员进行分散的深层学习,数据用户希望利用由多个不同数据所有者分离的数据来建立DL模型,但由于数据敏感性,这可能导致严重的隐私问题,因此数据所有者将犹豫不决,不愿参与。我们提议一个通过地方差异隐私和知识蒸馏来保存隐私的分布式深层学习框架,即每个数据所有者利用自己的(当地)私人数据集学习教师模型,数据使用者学习一个学生模型,以模拟教师模型的组合输出。在实验评估中,我们提出的模型(即,LDP-DL)、DP-SGD、PATE和DP-FLL,使用三种受欢迎的深层次基准数据序列,在FMIS-DML中持续展示FMAR-DFAFAFS-FAFAFML10 和FMIS-FAFASFAFAFSMLML 格式的其他结果。