There is a growing interest in developing unlearnable examples (UEs) against visual privacy leaks on the Internet. UEs are training samples added with invisible but unlearnable noise, which have been found can prevent unauthorized training of machine learning models. UEs typically are generated via a bilevel optimization framework with a surrogate model to remove (minimize) errors from the original samples, and then applied to protect the data against unknown target models. However, existing UE generation methods all rely on an ideal assumption called label-consistency, where the hackers and protectors are assumed to hold the same label for a given sample. In this work, we propose and promote a more practical label-agnostic setting, where the hackers may exploit the protected data quite differently from the protectors. E.g., a m-class unlearnable dataset held by the protector may be exploited by the hacker as a n-class dataset. Existing UE generation methods are rendered ineffective in this challenging setting. To tackle this challenge, we present a novel technique called Unlearnable Clusters (UCs) to generate label-agnostic unlearnable examples with cluster-wise perturbations. Furthermore, we propose to leverage VisionandLanguage Pre-trained Models (VLPMs) like CLIP as the surrogate model to improve the transferability of the crafted UCs to diverse domains. We empirically verify the effectiveness of our proposed approach under a variety of settings with different datasets, target models, and even commercial platforms Microsoft Azure and Baidu PaddlePaddle. Code is available at \url{https://github.com/jiamingzhang94/Unlearnable-Clusters}.
翻译:在互联网上,对视觉隐私泄露开发不可学习样本(UE)越来越受到关注。UE是训练样本加入不可见但学不会的噪声,已被发现可以防止未经授权的机器学习模型训练。 UE通常是通过双层优化框架与代理模型生成的,代理模型用于从原始样本中删除(最小化)错误,然后应用于保护数据免受未知目标模型的攻击。然而,现有的UE生成方法都依赖于一种理想假设,即标记一致性,即攻击者和保护者被认为对于给定的样本具有相同的标记。在这项工作中,我们提出并提倡一种更实际的无标签约束设置,其中攻击者可能与保护者对受保护数据的利用方式完全不同。例如,保护者持有的m类不可学习数据集可能被攻击者视为n类数据集。在这个具有挑战性的设置中,现有的UE生成方法无法发挥作用。为了应对这个挑战,我们提出了一种称为不可学习簇(UC)的新技术,以群集为基础进行扰动,生成标签无关的不可学习示例。此外,我们建议利用像CLIP这样的视觉和语言预训练模型(VLPM)作为代理模型,以改善精心制作的UCs对各种领域的可转移性。我们在各种设置下进行了实证验证,包括不同的数据集、目标模型,甚至包括商业平台Microsoft Azure和Baidu PaddlePaddle。代码可在 \url{https://github.com/jiami