The rapid development of deep learning has benefited from the release of some high-quality open-sourced datasets ($e.g.$, ImageNet), which allows researchers to easily verify the effectiveness of their algorithms. Almost all existing open-sourced datasets require that they can only be adopted for academic or educational purposes rather than commercial purposes, whereas there is still no good way to protect them. In this paper, we propose a backdoor embedding based dataset watermarking method to protect an open-sourced image-classification dataset by verifying whether it is used for training a third-party model. Specifically, the proposed method contains two main processes, including dataset watermarking and dataset verification. We adopt classical poisoning-based backdoor attacks ($e.g.$, BadNets) for dataset watermarking, $i.e.$, generating some poisoned samples by adding a certain trigger ($e.g.$, a local patch) onto some benign samples, labeled with a pre-defined target class. Based on the proposed backdoor-based watermarking, we use a hypothesis test guided method for dataset verification based on the posterior probability generated by the suspicious third-party model of the benign samples and their correspondingly watermarked samples ($i.e.$, images with trigger) on the target class. Experiments on some benchmark datasets are conducted, which verify the effectiveness of the proposed method.
翻译:深层学习的迅速发展得益于一些高质量的开放源数据集的公布(例如,美元,图像网),这些数据集使研究人员能够方便地核查其算法的有效性。几乎所有现有的开放源数据集都要求只能为学术或教育目的而非商业目的采用这些数据集,而仍然没有保护这些数据集的好办法。在本文件中,我们提议了一种后门嵌入基于数据库的水标记方法,以保护开放源图像分类数据集,通过核查该数据集是否用于培训第三方模型。具体地说,拟议方法包含两个主要程序,包括数据集水标记和数据集核查。我们采用了传统的基于中毒的后门攻击(例如,BadNets),用于标注数据的学术或教育目的,而不是商业目的,而现在仍然没有任何保护这些数据集的好办法。我们建议采用一种后门嵌嵌嵌嵌嵌嵌嵌嵌入数据库的方法,用预先界定的目标类标定的标签。根据拟议的后门水标记,我们使用一种假设测试方法,用以根据可疑的模型对数据库进行数据定位。根据模型,用可靠的基准进行数据采集。