Privacy-preserving machine learning enables the training of models on decentralized datasets without the need to reveal the data, both on horizontal and vertically partitioned data. However, it relies on specialized techniques and algorithms to perform the necessary computations. The privacy preserving scalar product protocol, which enables the dot product of vectors without revealing them, is one popular example for its versatility. Unfortunately, the solutions currently proposed in the literature focus mainly on two-party scenarios, even though scenarios with a higher number of data parties are becoming more relevant. For example when performing analyses that require counting the number of samples which fulfill certain criteria defined across various sites, such as calculating the information gain at a node in a decision tree. In this paper we propose a generalization of the protocol for an arbitrary number of parties, based on an existing two-party method. Our proposed solution relies on a recursive resolution of smaller scalar products. After describing our proposed method, we discuss potential scalability issues. Finally, we describe the privacy guarantees and identify any concerns, as well as comparing the proposed method to the original solution in this aspect.
翻译:保护隐私的机器学习可以对分散的数据集模型进行培训,而无需披露横向和纵向分割数据的数据。 但是,它依靠专门的技术和算法来进行必要的计算。 隐私保存卡路里产品协议使矢量的圆点产品能够不透露它们,是其多功能的一个流行例子。 不幸的是,文献中目前提出的解决方案主要侧重于两方情况,尽管数据缔约方数目较多的假设情况正在变得更加相关。例如,在进行分析时,需要计算满足不同地点界定的某些标准的样本数量,例如计算决策树节点上的信息收益。在本文中,我们提议根据现有的两方方法,对任意数目的缔约方普遍采用协议。我们提议的解决方案依赖于对较小的标卡产品的一种循环解决方案。在描述我们提出的方法之后,我们讨论了潜在的可缩放性问题。最后,我们描述了隐私保障,并确定了任何关切问题,并将拟议的方法与最初的解决方案相比较。