Deep neural networks (DNNs) have demonstrated their superiority in practice. Arguably, the rapid development of DNNs is largely benefited from high-quality (open-sourced) datasets, based on which researchers and developers can easily evaluate and improve their learning methods. Since the data collection is usually time-consuming or even expensive, how to protect their copyrights is of great significance and worth further exploration. In this paper, we revisit dataset ownership verification. We find that existing verification methods introduced new security risks in DNNs trained on the protected dataset, due to the targeted nature of poison-only backdoor watermarks. To alleviate this problem, in this work, we explore the untargeted backdoor watermarking scheme, where the abnormal model behaviors are not deterministic. Specifically, we introduce two dispersibilities and prove their correlation, based on which we design the untargeted backdoor watermark under both poisoned-label and clean-label settings. We also discuss how to use the proposed untargeted backdoor watermark for dataset ownership verification. Experiments on benchmark datasets verify the effectiveness of our methods and their resistance to existing backdoor defenses. Our codes are available at \url{https://github.com/THUYimingLi/Untargeted_Backdoor_Watermark}.
翻译:深度神经网络(DNN)在实践中展现出其卓越性。可以说,DNN的快速发展在很大程度上得益于高质量(开放式)的数据集,基于这些数据集,研究人员和开发人员可以轻松地评估和改进他们的学习方法。由于数据收集通常是耗时甚至昂贵的,因此如何保护它们的版权非常重要,值得进一步探索。在本文中,我们重新审视了数据集所有权验证。我们发现,由于基于毒性背景水印的有针对性,现有的验证方法在受保护数据集上训练的DNN中引入了新的安全风险。为了缓解这个问题,我们在这项工作中探讨了未定向的背景水印方案,其中异常的模型行为并不是确定性的。具体地,我们引入了两个分散性并证明了它们之间的相关性,基于这一点,我们在污染标签和干净标签的情况下设计了未定向的背景水印。我们还讨论了如何使用所提议的未定向背景水印进行数据集所有权验证。基准数据集上的实验证明了我们的方法的有效性以及其对现有背景水印防御措施的抵抗能力。我们的代码可在 \url{https://github.com/THUYimingLi/Untargeted_Backdoor_Watermark} 中找到。