While machine learning has achieved remarkable results in a wide variety of domains, the training of models often requires large datasets that may need to be collected from different individuals. As sensitive information may be contained in the individual's dataset, sharing training data may lead to severe privacy concerns. Therefore, there is a compelling need to develop privacy-aware machine learning methods, for which one effective approach is to leverage the generic framework of differential privacy. Considering that stochastic gradient descent (SGD) is one of the most commonly adopted methods for large-scale machine learning problems, a decentralized differentially private SGD algorithm is proposed in this work. Particularly, we focus on SGD without replacement due to its favorable structure for practical implementation. Both privacy and convergence analysis are provided for the proposed algorithm. Finally, extensive experiments are performed to demonstrate the effectiveness of the proposed method.
翻译:虽然机器学习在广泛领域取得了显著成果,但模型培训往往需要从不同个人收集大量可能需要收集的数据集,由于敏感信息可能包含在个人数据集中,分享培训数据可能导致严重的隐私问题,因此迫切需要开发有隐私意识的机器学习方法,为此,一种有效的办法是利用不同隐私的一般框架。考虑到随机梯度梯度下降是处理大规模机器学习问题最常用的方法之一,因此在这项工作中建议采用分散式的私人 SGD算法。特别是,我们把重点放在SGD上,而没有因为其有利于实际执行的结构而加以取代。为拟议的算法提供了隐私和趋同分析。最后,进行了广泛的实验,以证明拟议方法的有效性。