Finding anonymization mechanisms to protect personal data is at the heart of recent machine learning research. Here, we consider the consequences of local differential privacy constraints on goodness-of-fit testing, i.e. the statistical problem assessing whether sample points are generated from a fixed density $f_0$, or not. The observations are kept hidden and replaced by a stochastic transformation satisfying the local differential privacy constraint. In this setting, we propose a testing procedure which is based on an estimation of the quadratic distance between the density $f$ of the unobserved samples and $f_0$. We establish an upper bound on the separation distance associated with this test, and a matching lower bound on the minimax separation rates of testing under non-interactive privacy in the case that $f_0$ is uniform, in discrete and continuous settings. To the best of our knowledge, we provide the first minimax optimal test and associated private transformation under a local differential privacy constraint over Besov balls in the continuous setting, quantifying the price to pay for data privacy. We also present a test that is adaptive to the smoothness parameter of the unknown density and remains minimax optimal up to a logarithmic factor. Finally, we note that our results can be translated to the discrete case, where the treatment of probability vectors is shown to be equivalent to that of piecewise constant densities in our setting. That is why we work with a unified setting for both the continuous and the discrete cases.
翻译:寻找匿名机制来保护个人数据是最近机器学习研究的核心。 在这里, 我们考虑本地差异隐私限制对健康测试的影响, 即评估样本点是否来自固定密度$f_ 0美元或不是。 观察被隐藏, 由满足本地差异隐私限制的随机转换取而代之。 在这种背景下, 我们提出一个测试程序, 其依据是对未观测到的样本密度美元与未观测到的样本和0. 0美元之间的方差距离的估计。 我们为与本次测试相关的分离距离设定了一个上限, 并在非互动隐私下匹配了最低测试分差率的较低约束。 在离散和连续的环境中, 美元是统一的。 根据我们的知识, 我们提供第一个小型最大最佳测试和相关的私人转换, 在持续设置 Besov 蛋的本地差异性隐私限制下, 量化数据隐私的价格。 我们还提出一个测试, 正在适应与我们未知的平稳距离参数等量的测试, 在非互动隐私的隐私下, $_0美元是统一的, 。 并且 将我们最优的矢量 的值排序 。