Model-based attacks can infer training data information from deep neural network models. These attacks heavily depend on the attacker's knowledge of the application domain, e.g., using it to determine the auxiliary data for model-inversion attacks. However, attackers may not know what the model is used for in practice. We propose a generative adversarial network (GAN) based method to explore likely or similar domains of a target model -- the model domain inference (MDI) attack. For a given target (classification) model, we assume that the attacker knows nothing but the input and output formats and can use the model to derive the prediction for any input in the desired form. Our basic idea is to use the target model to affect a GAN training process for a candidate domain's dataset that is easy to obtain. We find that the target model may distract the training procedure less if the domain is more similar to the target domain. We then measure the distraction level with the distance between GAN-generated datasets, which can be used to rank candidate domains for the target model. Our experiments show that the auxiliary dataset from an MDI top-ranked domain can effectively boost the result of model-inversion attacks.
翻译:以模型为基础的攻击可以从深神经网络模型中推断出培训数据信息。 这些攻击在很大程度上取决于攻击者对应用领域的了解, 例如,使用攻击者对应用领域的了解, 例如, 使用攻击者对模型反向攻击的辅助数据来决定辅助数据。 但是, 攻击者可能不知道该模型在实践中使用什么模式。 我们提出一个基于基因化对抗网络( GAN) 的方法来探索目标模型的可能或相似的领域 -- -- 模型域推断( MDI) 攻击。 对于给定的目标( 分类) 模型, 我们假设攻击者只知道输入和输出格式, 并且可以使用该模型来得出任何预期输入形式的预测。 我们的基本想法是使用目标模型来影响一个候选域数据集的GAN培训过程, 很容易获得这些数据集。 我们发现, 目标模型可能会减少培训程序的注意力, 如果域域域与目标域更相似, 我们然后用GAN生成的数据集之间的距离测量分心水平, 这些数据集可用于目标模型的候选域名。 我们的实验显示, 从磁场反向式攻击的辅助数据集可以有效地推进结果。