A large body of work shows that machine learning (ML) models can leak sensitive or confidential information about their training data. Recently, leakage due to distribution inference (or property inference) attacks is gaining attention. In this attack, the goal of an adversary is to infer distributional information about the training data. So far, research on distribution inference has focused on demonstrating successful attacks, with little attention given to identifying the potential causes of the leakage and to proposing mitigations. To bridge this gap, as our main contribution, we theoretically and empirically analyze the sources of information leakage that allows an adversary to perpetrate distribution inference attacks. We identify three sources of leakage: (1) memorizing specific information about the $\mathbb{E}[Y|X]$ (expected label given the feature values) of interest to the adversary, (2) wrong inductive bias of the model, and (3) finiteness of the training data. Next, based on our analysis, we propose principled mitigation techniques against distribution inference attacks. Specifically, we demonstrate that causal learning techniques are more resilient to a particular type of distribution inference risk termed distributional membership inference than associative learning methods. And lastly, we present a formalization of distribution inference that allows for reasoning about more general adversaries than was previously possible.
翻译:大量工作表明,机器学习模式可以泄露有关其培训数据的敏感或机密信息。最近,由于分布推断(或财产推断)攻击而泄漏的信息正在引起人们的注意。在这次攻击中,对手的目标是推断培训数据的分配信息。到目前为止,关于分布推断的研究侧重于证明成功的攻击,很少注意确定渗漏的潜在原因并提出缓解建议。为了弥补这一差距,作为我们的主要贡献,我们从理论上和从经验上分析了使对手能够进行分布推断攻击的信息渗漏的来源。我们查明了三种渗漏来源:(1) 将关于对对手感兴趣的美元(mathbb{E}[Y ⁇ X]$(根据特性值的预期标签)的具体信息混为一谈。(2) 模型的暗示偏差和(3) 培训数据的有限性。接着,根据我们的分析,我们建议有原则的减轻方法来对付分布推断攻击。具体地说,我们证明因果学习技术更能适应特定类型的分布推导风险,即分配成员,而不是先前的可靠性推导推理。