Most methods for publishing data with privacy guarantees introduce randomness into datasets which reduces the utility of the published data. In this paper, we study the privacy-utility tradeoff by taking maximal leakage as the privacy measure and the expected Hamming distortion as the utility measure. We study three different but related problems. First, we assume that the data-generating distribution (i.e., the prior) is known, and we find the optimal privacy mechanism that achieves the smallest distortion subject to a constraint on maximal leakage. Then, we assume that the prior belongs to some set of distributions, and we formulate a min-max problem for finding the smallest distortion achievable for the worst-case prior in the set, subject to a maximal leakage constraint. Lastly, we define a partial order on privacy mechanisms based on the largest distortion they generate. Our results show that when the prior distribution is known, the optimal privacy mechanism fully discloses symbols with the largest prior probabilities, and suppresses symbols with the smallest prior probabilities. Furthermore, we show that sets of priors that contain more uniform distributions lead to larger distortion, while privacy mechanisms that distribute the privacy budget more uniformly over the symbols create smaller worst-case distortion.
翻译:使用隐私保障发布数据的大多数方法都会在数据集中引入随机性,从而降低已公布数据的效用。 在本文中,我们研究隐私效用权衡,将最大渗漏作为隐私衡量标准,并将预期的咸明扭曲作为效用衡量标准。 我们研究了三个不同但相关的问题。 首先,我们假设数据生成分布(即先前的)已经为人所知,并且我们发现最佳隐私机制可以实现最小的扭曲,但受最大渗漏的限制。 然后,我们假设先前的偏差属于某些批发系统,我们制定了一个最小最大问题,以找到最坏的漏泄密前一套最坏情况可以实现最小的扭曲,但受最大漏漏漏漏限制。 最后,我们根据它们产生的最大扭曲,界定了隐私机制的部分顺序。 我们的结果表明,当知道先前的分布时,最佳的隐私机制充分披露了具有最大前几率的符号,并抑制了最小的先前概率的符号。 此外,我们显示,前几套包含更统一的分布导致更大变形,而最差的隐私机制则以最小的缩的符号为基础。