How to train a machine learning model while keeping the data private and secure? We present CodedPrivateML, a fast and scalable approach to this critical problem. CodedPrivateML keeps both the data and the model information-theoretically private, while allowing efficient parallelization of training across distributed workers. We characterize CodedPrivateML's privacy threshold and prove its convergence for logistic (and linear) regression. Furthermore, via extensive experiments on Amazon EC2, we demonstrate that CodedPrivateML provides significant speedup over cryptographic approaches based on multi-party computing (MPC).
翻译:如何在保持数据私密和安全的同时训练机器学习模式? 我们提出编码私人ML,这是解决这一关键问题的快速和可扩展的方法。编码私人ML既保持数据和模范信息理论的私密性,又允许分布工人之间有效地平行培训。我们确定编码私人ML的隐私门槛,并证明它在后勤(和线性)倒退方面趋于一致。此外,通过对亚马逊EC2的广泛实验,我们证明编码私人ML在多党计算的基础上大大加快加密方法。