Common approach of machine learning is to generate a model by using huge amount of training data to predict the test data instances as accurate as possible. Nonetheless, concerns about data privacy are increasingly raised, but not always addressed. We present a secure protocol for obtaining a linear model relying on recently described technique called real number secret sharing. We take as our starting point the PAC Bayesian bounds and deduce a closed form for the model parameters which depends on the data and the prior from the PAC Bayesian bounds. To obtain the model parameters one need to solve a linear system. However, we consider the situation where several parties hold different data instances and they are not willing to give up the privacy of the data. Hence, we suggest to use real number secret sharing and multiparty computation to share the data and solve the linear regression in a secure way without violating the privacy of data. We suggest two methods; an inverse method and a Gaussian elimination method, and compare these methods at the end.
翻译:机器学习的常见方法是利用大量培训数据来尽可能准确地预测测试数据实例,从而生成一个模型。然而,对数据隐私的关切却日益增加,但并非始终得到处理。我们提出了一个安全的协议,以获得一个依赖最近描述的称为实际数字秘密共享的技术的线性模型。我们以PAC Bayesian 界限作为起点,并推导出一个取决于数据和以前PAC Bayesian界限的模型参数的封闭形式。为了获得模型参数,我们需要找到一个线性系统。然而,我们考虑了几个当事方持有不同数据实例而不愿放弃数据隐私的情况。因此,我们建议使用真实数字共享和多式计算来分享数据,并以安全的方式解决线性回归,而不侵犯数据隐私。我们建议了两种方法:一种反向方法和高斯清除方法,并在最后比较这些方法。