不同隐私下的多变平均值比较 (Multivariate Mean Comparison under Differential Privacy)

The comparison of multivariate population means is a central task of statistical inference. While statistical theory provides a variety of analysis tools, they usually do not protect individuals' privacy. This knowledge can create incentives for participants in a study to conceal their true data (especially for outliers), which might result in a distorted analysis. In this paper we address this problem by developing a hypothesis test for multivariate mean comparisons that guarantees differential privacy to users. The test statistic is based on the popular Hotelling's $t^2$-statistic, which has a natural interpretation in terms of the Mahalanobis distance. In order to control the type-1-error, we present a bootstrap algorithm under differential privacy that provably yields a reliable test decision. In an empirical study we demonstrate the applicability of this approach.

翻译：多变人口手段的比较是统计推理的中心任务。虽然统计理论提供了各种分析工具,但它们通常不会保护个人的隐私。这种知识可以激励研究参与者隐瞒真实数据(特别是外星数据),从而导致扭曲分析。在本文中,我们通过对多变平均比较进行假设测试来解决这个问题,从而保证用户的隐私差异。测试统计数据以流行的宾馆的$t ⁇ 2$-统计学为基础,该统计学对马哈拉诺比斯距离有自然解释。为了控制1-eror,我们在差异隐私权下提出了一种可以产生可靠测试决定的靴式算法。在一项实验研究中,我们展示了这一方法的适用性。