Differentially Private (DP) data release is a promising technique to disseminate data without compromising the privacy of data subjects. However the majority of prior work has focused on scenarios where a single party owns all the data. In this paper we focus on the multi-party setting, where different stakeholders own disjoint sets of attributes belonging to the same group of data subjects. Within the context of linear regression that allow all parties to train models on the complete data without the ability to infer private attributes or identities of individuals, we start with directly applying Gaussian mechanism and show it has the small eigenvalue problem. We further propose our novel method and prove it asymptotically converges to the optimal (non-private) solutions with increasing dataset size. We substantiate the theoretical results through experiments on both artificial and real-world datasets.
翻译:差异化的私人(DP)数据发布是传播数据而又不损害数据主体隐私的一个很有希望的技术。然而,先前的大部分工作侧重于单一方拥有所有数据的情景。在本文件中,我们侧重于多方环境,不同利益攸关方拥有属于同一组数据主体的不相干属性组。在允许所有各方在无能力推断个人私人属性或身份的情况下就完整数据进行模型培训的线性回归背景下,我们从直接应用高森机制开始,并显示其存在小的二元值问题。我们进一步提出我们的新颖方法,并证明它与数据设置规模越来越大的最佳(非私人)解决方案在本质上一致。我们通过人工和现实世界数据集实验来证实理论结果。