Given the ubiquity of deep neural networks, it is important that these models do not reveal information about sensitive data that they have been trained on. In model inversion attacks, a malicious user attempts to recover the private dataset used to train a supervised neural network. A successful model inversion attack should generate realistic and diverse samples that accurately describe each of the classes in the private dataset. In this work, we provide a probabilistic interpretation of model inversion attacks, and formulate a variational objective that accounts for both diversity and accuracy. In order to optimize this variational objective, we choose a variational family defined in the code space of a deep generative model, trained on a public auxiliary dataset that shares some structural similarity with the target dataset. Empirically, our method substantially improves performance in terms of target attack accuracy, sample realism, and diversity on datasets of faces and chest X-ray images.
翻译:鉴于深层神经网络的普遍存在,这些模型必须不披露它们所培训的敏感数据的信息。在模型反向攻击中,恶意用户试图恢复用于训练受监督神经网络的私人数据集。成功的模型反向攻击应产生现实和多样的样本,准确描述私人数据集中的每一类。在这项工作中,我们对模型反向攻击进行概率解释,并制定一个变量目标,既考虑多样性,也考虑准确性。为了优化这一变异性目标,我们选择了一个在代码空间中定义的变式家庭,即深基因模型,接受公共辅助数据集培训,该数据集与目标数据集具有某种结构上的相似性。我们的方法在目标攻击准确性、样本真实性以及面部和胸部X光图像数据集多样性方面大有改进。