Training a machine learning model over an encrypted dataset is an existing promising approach to address the privacy-preserving machine learning task, however, it is extremely challenging to efficiently train a deep neural network (DNN) model over encrypted data for two reasons: first, it requires large-scale computation over huge datasets; second, the existing solutions for computation over encrypted data, such as homomorphic encryption, is inefficient. Further, for an enhanced performance of a DNN model, we also need to use huge training datasets composed of data from multiple data sources that may not have pre-established trust relationships among each other. We propose a novel framework, NN-EMD, to train DNN over multiple encrypted datasets collected from multiple sources. Toward this, we propose a set of secure computation protocols using hybrid functional encryption schemes. We evaluate our framework for performance with regards to the training time and model accuracy on the MNIST datasets. Compared to other existing frameworks, our proposed NN-EMD framework can significantly reduce the training time, while providing comparable model accuracy and privacy guarantees as well as supporting multiple data sources. Furthermore, the depth and complexity of neural networks do not affect the training time despite introducing a privacy-preserving NN-EMD setting.
翻译:在加密数据集上进行机器学习模型的培训是解决隐私保存机器学习任务的一个很有希望的现有方法,但是,对深神经网络(DNN)模型进行与加密数据相比的高效培训是极具挑战性的,原因有二:第一,它要求对庞大的数据集进行大规模计算;第二,对加密数据进行计算的现有解决办法,例如同质加密,效率低下。此外,为了提高DNN模型的性能,我们还需要使用由多个数据来源的数据组成的庞大的培训数据集,这些数据可能没有预先建立相互信任关系。我们提议了一个新框架,即NNN-EMD,对从多个来源收集的多套加密数据集进行训练。为此,我们提出一套使用混合功能加密办法的安全计算协议。我们评估我们在MNIST数据集培训时间和模型准确性方面的性能框架。与其他现有框架相比,我们提议的NNN-EMD框架可以大大减少培训时间,同时提供可比的模型准确性和隐私保障,并支持多个数据源。此外,我们提出的NNEM网络的深度和复杂性并不影响内存性网络的深度和复杂性,尽管引入了一种保密性能-D。