Consider two data providers that want to contribute data to a certain learning model. Recent works have shown that the value of the data of one of the providers is dependent on the similarity with the data owned by the other provider. It would thus be beneficial if the two providers can calculate the similarity of their data, while keeping the actual data private. In this work, we devise multiparty computation-protocols to compute similarity of two data sets based on correlation, while offering controllable privacy guarantees. We consider a simple model with two participating providers and develop methods to compute exact and approximate correlation, respectively, with controlled information leakage. Both protocols have computational and communication complexities that are linear in the number of data samples. We also provide general bounds on the maximal error in the approximation case, and analyse the resulting errors for practical parameter choices.
翻译:考虑两个数据提供者希望为某个学习模式提供数据。最近的工作显示,其中一个提供者的数据值取决于与另一个提供者拥有的数据的相似性,因此,如果这两个提供者能够计算其数据相似性,同时保持实际数据私密,则有益。在这项工作中,我们设计多功能计算程序,根据相关关系计算两个数据集的相似性,同时提供可控制的隐私保障。我们考虑一个简单模型,由两个参与的提供者计算,并制订方法,分别计算与受控信息渗漏的精确和近似关联。两个协议的计算和通信复杂性在数据样本数量上都是线性的。我们还提供近似情况下最大误差的一般界限,并分析由此产生的误差,以便作出实际参数选择。