Differential privacy (DP) is by far the most widely accepted framework for mitigating privacy risks in machine learning. However, exactly how small the privacy parameter $\epsilon$ needs to be to protect against certain privacy risks in practice is still not well-understood. In this work, we study data reconstruction attacks for discrete data and analyze it under the framework of multiple hypothesis testing. We utilize different variants of the celebrated Fano's inequality to derive upper bounds on the inferential power of a data reconstruction adversary when the model is trained differentially privately. Importantly, we show that if the underlying private data takes values from a set of size $M$, then the target privacy parameter $\epsilon$ can be $O(\log M)$ before the adversary gains significant inferential power. Our analysis offers theoretical evidence for the empirical effectiveness of DP against data reconstruction attacks even at relatively large values of $\epsilon$.
翻译:差异隐私(DP)是迄今为止最广为接受的减轻机器学习中的隐私风险的框架。 然而,隐私参数$\ epslon$究竟需要多少才能在实际中保护某些隐私风险,还没有得到很好理解。 在这项工作中,我们研究离散数据的数据重建攻击,并在多重假设测试的框架内分析这些数据。我们利用著名的法诺不平等的不同变种,在模型经过不同程度的私下培训时,从数据重建对手的推断能力上获取上限。 重要的是,我们显示如果基本私人数据从一组大小的$M$中获取价值,那么目标隐私参数$\ epslon$可以达到$O(\log M)$,然后对手获得巨大的推断能力。我们的分析提供了理论证据,证明DP对数据重建攻击的经验效力,即使相对而言价值为$\ epslon$。