在关系数据上进行安全的机器学习 (Secure Machine Learning over Relational Data)

A closer integration of machine learning and relational databases has gained steam in recent years due to the fact that the training data to many ML tasks is the results of a relational query (most often, a join-select query). In a federated setting, this poses an additional challenge, that the tables are held by different parties as their private data, and the parties would like to train the model without having to use a trusted third party. Existing work has only considered the case where the training data is stored in a flat table that has been vertically partitioned, which corresponds to a simple PK-PK join. In this paper, we describe secure protocols to compute the join results of multiple tables conforming to a general foreign-key acyclic schema, and how to feed the results in secret-shared form to a secure ML toolbox. Furthermore, existing secure ML systems reveal the PKs in the join results. We strengthen the privacy protection to higher levels and achieve zero information leakage beyond the trained model. If the model itself is considered sensitive, we show how differential privacy can be incorporated into our framework to also prevent the model from breaching individuals' privacy.

翻译：近年来,由于许多ML任务的培训数据是关系查询的结果(最常见的是联合选择查询),机器学习和关系数据库的更紧密整合近年来得到了动力。在联合环境下,这带来了另一个挑战,即表格由不同当事方作为私人数据持有,当事方希望不必使用信任的第三方就模型进行培训。现有工作仅考虑了培训数据储存在一个垂直隔绝的平板上的情况,该表与简单的PK-PK连结相对应。在本文件中,我们描述了安全协议,以计算符合一般外国钥匙环球车型的多张表的组合结果,以及如何以秘密共享的形式将结果输入安全的 ML 工具箱。此外,现有的安全 ML 系统在连结结果中披露PK 。我们加强了隐私保护,使其达到更高的水平,并实现超过经过培训的模式的零信息渗漏。如果模型本身被认为敏感,我们就会说明在我们的框架中如何将隐私差异纳入,从而防止模型侵犯个人隐私。

相关内容

Machine Learning

关注 2240

机器学习（Machine Learning）是一个研究计算学习方法的国际论坛。该杂志发表文章，报告广泛的学习方法应用于各种学习问题的实质性结果。该杂志的特色论文描述研究的问题和方法，应用研究和研究方法的问题。有关学习问题或方法的论文通过实证研究、理论分析或与心理现象的比较提供了坚实的支持。应用论文展示了如何应用学习方法来解决重要的应用问题。研究方法论文改进了机器学习的研究方法。所有的论文都以其他研究人员可以验证或复制的方式描述了支持证据。论文还详细说明了学习的组成部分，并讨论了关于知识表示和性能任务的假设。官网地址：http://dblp.uni-trier.de/db/journals/ml/