A closer integration of machine learning and relational databases has gained steam in recent years due to the fact that the training data to many ML tasks is the results of a relational query (most often, a join-select query). In a federated setting, this poses an additional challenge, that the tables are held by different parties as their private data, and the parties would like to train the model without having to use a trusted third party. Existing work has only considered the case where the training data is stored in a flat table that has been vertically partitioned, which corresponds to a simple PK-PK join. In this paper, we describe secure protocols to compute the join results of multiple tables conforming to a general foreign-key acyclic schema, and how to feed the results in secret-shared form to a secure ML toolbox. Furthermore, existing secure ML systems reveal the PKs in the join results. We strengthen the privacy protection to higher levels and achieve zero information leakage beyond the trained model. If the model itself is considered sensitive, we show how differential privacy can be incorporated into our framework to also prevent the model from breaching individuals' privacy.
翻译:近年来,由于许多ML任务的培训数据是关系查询的结果(最常见的是联合选择查询),机器学习和关系数据库的更紧密整合近年来得到了动力。在联合环境下,这带来了另一个挑战,即表格由不同当事方作为私人数据持有,当事方希望不必使用信任的第三方就模型进行培训。现有工作仅考虑了培训数据储存在一个垂直隔绝的平板上的情况,该表与简单的PK-PK连结相对应。在本文件中,我们描述了安全协议,以计算符合一般外国钥匙环球车型的多张表的组合结果,以及如何以秘密共享的形式将结果输入安全的 ML 工具箱。此外,现有的安全 ML 系统在连结结果中披露PK 。我们加强了隐私保护,使其达到更高的水平,并实现超过经过培训的模式的零信息渗漏。如果模型本身被认为敏感,我们就会说明在我们的框架中如何将隐私差异纳入,从而防止模型侵犯个人隐私。