Protecting personal data against the exploitation of machine learning models is of paramount importance. Recently, availability attacks have shown great promise to provide an extra layer of protection against the unauthorized use of data to train neural networks. These methods aim to add imperceptible noise to clean data so that the neural networks cannot extract meaningful patterns from the protected data, claiming that they can make personal data "unexploitable." In this paper, we provide a strong countermeasure against such approaches, showing that unexploitable data might only be an illusion. In particular, we leverage the power of diffusion models and show that a carefully designed denoising process can defuse the ramifications of the data-protecting perturbations. We rigorously analyze our algorithm, and theoretically prove that the amount of required denoising is directly related to the magnitude of the data-protecting perturbations. Our approach, called AVATAR, delivers state-of-the-art performance against a suite of recent availability attacks in various scenarios, outperforming adversarial training. Our findings call for more research into making personal data unexploitable, showing that this goal is far from over.
翻译:保护个人数据免受机器学习模式的利用至关重要。 最近, 可用性攻击展示了巨大的希望, 提供了额外的保护层, 防止未经授权使用数据来培养神经网络。 这些方法旨在增加无法察觉的噪音来清理数据, 从而使神经网络无法从受保护的数据中提取有意义的模式, 声称它们能够使个人数据“ 无法开发 ” 。 在本文中, 我们提供了针对这些方法的有力应对措施, 表明无法开发的数据可能只是一种幻觉。 特别是, 我们利用了传播模型的力量, 并表明精心设计的分解过程可以消除数据保护扰动的影响。 我们严格分析我们的算法, 理论上证明所需的分解数量与数据保护扰动的程度直接相关。 我们称为 AVATAR( AVATAR) 的方法, 提供了针对最近各种情况下一系列可获取性攻击的状态的状态表现, 胜过对抗性训练。 我们的研究结果呼吁进行更多的研究, 使个人数据无法开发, 表明这一目标远没有结束 。</s>