Model inversion attacks are a type of privacy attack that reconstructs private data used to train a machine learning model, solely by accessing the model. Recently, white-box model inversion attacks leveraging Generative Adversarial Networks (GANs) to distill knowledge from public datasets have been receiving great attention because of their excellent attack performance. On the other hand, current black-box model inversion attacks that utilize GANs suffer from issues such as being unable to guarantee the completion of the attack process within a predetermined number of query accesses or achieve the same level of performance as white-box attacks. To overcome these limitations, we propose a reinforcement learning-based black-box model inversion attack. We formulate the latent space search as a Markov Decision Process (MDP) problem and solve it with reinforcement learning. Our method utilizes the confidence scores of the generated images to provide rewards to an agent. Finally, the private data can be reconstructed using the latent vectors found by the agent trained in the MDP. The experiment results on various datasets and models demonstrate that our attack successfully recovers the private information of the target model by achieving state-of-the-art attack performance. We emphasize the importance of studies on privacy-preserving machine learning by proposing a more advanced black-box model inversion attack.
翻译:模型反演攻击是一种隐私攻击,它仅通过访问模型就能重构用于训练机器学习模型的私人数据。最近,利用生成对抗网络 (GAN) 从公共数据集中提取知识的白盒模型反演攻击因其出色的攻击性能而受到极大关注。另一方面,当前利用 GAN 的黑盒模型反演攻击存在问题,例如无法保证在预设的查询访问次数内完成攻击过程或无法达到白盒攻击的同等水平。为了克服这些限制,我们提出了一种基于强化学习的黑盒模型反演攻击。我们将潜在空间搜索形式化为马尔科夫决策过程 (MDP) 问题,并用强化学习来解决它。我们的方法利用生成的图像的置信度分数为智能体提供奖励。最终,可以使用在 MDP 中训练的智能体找到的潜在向量来重构私有数据。在各种数据集和模型上的实验结果表明,我们的攻击成功地通过实现先进的攻击性能来恢复目标模型的私有信息。我们强调了关于隐私保护机器学习的研究的重要性,提出了一种更高级的黑盒模型反演攻击。