A common approach of system identification and machine learning is to generate a model by using training data to predict the test data instances as accurate as possible. Nonetheless, concerns about data privacy are increasingly raised, but not always addressed. We present a secure protocol for learning a linear model relying on recently described technique called real number secret sharing. We take as our starting point the PAC Bayesian bounds and deduce a closed form for the model parameters which depends on the data and the prior from the PAC Bayesian bounds. To obtain the model parameters one needs to solve a linear system. However, we consider the situation where several parties hold different data instances and they are not willing to give up the privacy of the data. Hence, we suggest to use real number secret sharing and multiparty computation to share the data and solve the linear regression in a secure way without violating the privacy of data. We suggest two methods; a secure inverse method and a secure Gaussian elimination method, and compare these methods at the end. The benefit of using secret sharing directly on real numbers is reflected in the simplicity of the protocols and the number of rounds needed. However, this comes with the drawback that a share might leak a small amount of information, but in our analysis we argue that the leakage is small.
翻译:常见的系统识别和机器学习方法是通过使用训练数据生成模型,以尽可能准确地预测测试数据实例。然而,对数据隐私的担忧越来越多,但并不总是得到解决。我们提出了一种安全的线性模型学习协议,其中使用了最近描述的实数秘密共享技术。我们以PAC Bayesian界限为起点,并推导出一个闭合形式的模型参数,该参数取决于来自PAC Bayesian界限的数据和先前知识。要获取模型参数,需要解决线性系统。但是,我们考虑了几个各持有不同数据实例并且不愿泄露数据隐私的当事方的情况。因此,我们建议使用实数秘密共享和多方运算来共享数据并在不违反数据隐私的情况下安全地解决线性回归。我们提出了两种方法:安全求逆法和安全高斯消元法,并在最后进行比较。直接在实数上使用秘密共享的好处体现在协议的简单性和所需回合数上。但是,这会带来一个缺点,即一个份额可能会泄漏一小部分信息,但在我们的分析中,我们认为泄漏很小。