正式化和估计分配额推断风险 (Formalizing and Estimating Distribution Inference Risks)

from arxiv, Shorter version of work available at arXiv:2106.03699 Update: New version with more theoretical results and a deeper exploration of results

Distribution inference, sometimes called property inference, infers statistical properties about a training set from access to a model trained on that data. Distribution inference attacks can pose serious risks when models are trained on private data, but are difficult to distinguish from the intrinsic purpose of statistical machine learning -- namely, to produce models that capture statistical properties about a distribution. Motivated by Yeom et al.'s membership inference framework, we propose a formal definition of distribution inference attacks that is general enough to describe a broad class of attacks distinguishing between possible training distributions. We show how our definition captures previous ratio-based property inference attacks as well as new kinds of attack including revealing the average node degree or clustering coefficient of a training graph. To understand distribution inference risks, we introduce a metric that quantifies observed leakage by relating it to the leakage that would occur if samples from the training distribution were provided directly to the adversary. We report on a series of experiments across a range of different distributions using both novel black-box attacks and improved versions of the state-of-the-art white-box attacks. Our results show that inexpensive attacks are often as effective as expensive meta-classifier attacks, and that there are surprising asymmetries in the effectiveness of attacks.

翻译：分布推论,有时称为财产推论,推断出从访问到数据模型的一组培训的统计属性。分布推论攻击在模型接受私人数据培训时可能构成严重风险,但很难与统计机器学习的内在目的区分 -- -- 即制作模型,记录分布统计属性的模型。受Yeom等人成员的推论框架的驱动,我们提出了一个正式的分布推论攻击定义,该定义非常笼统,足以描述广泛的种类的攻击,区分可能的培训分布。我们给出了我们的定义如何捕捉以前基于比率的财产推断攻击以及新的攻击类型,包括显示培训图表的平均零度或组合系数。为了了解分布推论风险,我们引入了一种测量观察到的渗漏的尺度,即如果直接向敌方提供培训分布的样品,就会发生渗漏。我们报告了一系列不同分布系列的实验,其中既使用了新的黑箱攻击,也使用了最新版本的白箱攻击。我们的结果显示,低价攻击作为价格袭击的元,往往具有高价袭击的元性。