频率表对良好利用的私人概率比量测试的功率 (The power of private likelihood-ratio tests for goodness-of-fit in frequency tables)

Privacy-protecting data analysis investigates statistical methods under privacy constraints. This is a rising challenge in modern statistics, as the achievement of confidentiality guarantees, which typically occurs through suitable perturbations of the data, may determine a loss in the statistical utility of the data. In this paper, we consider privacy-protecting tests for goodness-of-fit in frequency tables, this being arguably the most common form of releasing data, and present a rigorous analysis of the large sample behaviour of a private likelihood-ratio (LR) test. Under the framework of $(\varepsilon,\delta)$-differential privacy for perturbed data, our main contribution is the power analysis of the private LR test, which characterizes the trade-off between confidentiality, measured via the differential privacy parameters $(\varepsilon,\delta)$, and statistical utility, measured via the power of the test. This is obtained through a Bahadur-Rao large deviation expansion for the power of the private LR test, bringing out a critical quantity, as a function of the sample size, the dimension of the table and $(\varepsilon,\delta)$, that determines a loss in the power of the test. Such a result is then applied to characterize the impact of the sample size and the dimension of the table, in connection with the parameters $(\varepsilon,\delta)$, on the loss of the power of the private LR test. In particular, we determine the (sample) cost of $(\varepsilon,\delta)$-differential privacy in the private LR test, namely the additional sample size that is required to recover the power of the Multinomial LR test in the absence of perturbation. Our power analysis rely on a non-standard large deviation analysis for the LR, as well as the development of a novel (sharp) large deviation principle for sum of i.i.d. random vectors, which is of independent interest.

翻译：保护隐私的数据分析在隐私限制下调查统计方法。这是现代统计中一个越来越大的挑战,因为通常通过数据的适当扰动实现保密保障,可能决定数据在统计用途方面的损失。在本文中,我们考虑对频率表的完善性进行隐私保护测试,这可以说是发布数据的最常见形式,并对私人概率拉皮(LR)测试的大量抽样行为进行严格分析。在(alecalalal) 的框架中,对渗透数据的保密隐私(ralalalalal,\delta) 框架,我们的主要贡献是私基LR测试的动力分析,这是通过差异性隐私参数参数参数(delralalalalal,rdelta) 的保密性测试。在(alderralal) 参数变异(alalalalalal), 美元(rdalm) 参数变异性变异性变异性变异性(alrus) 、美元变异性变异性变异性变异性变异性变异性变(lusal) 变变异性变异性变异性变异性变异性) 和变异性变异性变(lationlationluslationallationallationallation) ylational ylationalalal ylational ylusl) yal ylation ylus ylational ylus y y ylation ylation 。