The Maximal Information Coefficient (MIC) is a powerful statistic to identify dependencies between variables. However, it may be applied to sensitive data, and publishing it could leak private information. As a solution, we present algorithms to approximate MIC in a way that provides differential privacy. We show that the natural application of the classic Laplace mechanism yields insufficient accuracy. We therefore introduce the MICr statistic, which is a new MIC approximation that is more compatible with differential privacy. We prove MICr is a consistent estimator for MIC, and we provide two differentially private versions of it. We perform experiments on a variety of real and synthetic datasets. The results show that the private MICr statistics significantly outperform direct application of the Laplace mechanism. Moreover, experiments on real-world datasets show accuracy that is usable when the sample size is at least moderately large.
翻译:最大信息系数(MIC)是确定变量之间依赖性的有力统计数据。 但是,它可以适用于敏感数据,并公布它可能会泄露私人信息。 作为一种解决办法,我们提出算法,以提供不同隐私的方式接近 MIC 。 我们表明经典 Laplace 机制的自然应用并不准确。 因此,我们引入了新的 MICR 统计, 这是一种与差异隐私更兼容的新的 MIC 近似值。 我们证明 MICr 是 MIC 的一致测量器, 我们提供两种不同的私人版本。 我们用各种真实和合成数据集进行实验。 结果显示, 私人 MICr 统计数据明显地超出了 Laplace 机制的直接应用。 此外, 真实世界数据集实验显示, 当样本大小至少为中等大时, 准确性是可用的 。