Many applications of machine learning, such as human health research, involve processing private or sensitive information. Privacy concerns may impose significant hurdles to collaboration in scenarios where there are multiple sites holding data and the goal is to estimate properties jointly across all datasets. Differentially private decentralized algorithms can provide strong privacy guarantees. However, the accuracy of the joint estimates may be poor when the datasets at each site are small. This paper proposes a new framework, Correlation Assisted Private Estimation (CAPE), for designing privacy-preserving decentralized algorithms with better accuracy guarantees in an honest-but-curious model. CAPE can be used in conjunction with the functional mechanism for statistical and machine learning optimization problems. A tighter characterization of the functional mechanism is provided that allows CAPE to achieve the same performance as a centralized algorithm in the decentralized setting using all datasets. Empirical results on regression and neural network problems for both synthetic and real datasets show that differentially private methods can be competitive with non-private algorithms in many scenarios of interest.
翻译:许多机器学习应用,如人体健康研究,涉及处理私人或敏感信息。隐私问题可能会对合作设置重大障碍,因为如果存在多个保存数据的站点,目标是在所有数据集中共同估计属性。不同的私营分散算法可以提供有力的隐私保障。然而,当每个站点的数据集小时,联合估计的准确性可能很差。本文件提议了一个新的框架,即关联辅助私人估算(CAPE),用于设计隐私保存分散算法,在诚实但可靠的模型中保证准确性更高。CAPE可以与统计和机器学习优化问题的功能机制一起使用。对功能机制作了更严格的定性,使CAPE能够利用所有数据集在分散的环境下实现与集中算法相同的功能。合成和真实数据集的回归和神经网络问题经验性结果显示,在许多感兴趣的情景中,有差异的私人方法可以与非私人算法相比具有竞争力。