通过后门水标记保护公开来源数据集 (Open-sourced Dataset Protection via Backdoor Watermarking)

The rapid development of deep learning has benefited from the release of some high-quality open-sourced datasets ($e.g.$, ImageNet), which allows researchers to easily verify the effectiveness of their algorithms. Almost all existing open-sourced datasets require that they can only be adopted for academic or educational purposes rather than commercial purposes, whereas there is still no good way to protect them. In this paper, we propose a backdoor embedding based dataset watermarking method to protect an open-sourced image-classification dataset by verifying whether it is used for training a third-party model. Specifically, the proposed method contains two main processes, including dataset watermarking and dataset verification. We adopt classical poisoning-based backdoor attacks ($e.g.$, BadNets) for dataset watermarking, $i.e.$, generating some poisoned samples by adding a certain trigger ($e.g.$, a local patch) onto some benign samples, labeled with a pre-defined target class. Based on the proposed backdoor-based watermarking, we use a hypothesis test guided method for dataset verification based on the posterior probability generated by the suspicious third-party model of the benign samples and their correspondingly watermarked samples ($i.e.$, images with trigger) on the target class. Experiments on some benchmark datasets are conducted, which verify the effectiveness of the proposed method.

翻译：深层学习的迅速发展得益于一些高质量的开放源数据集的公布(例如,美元,图像网),这些数据集使研究人员能够方便地核查其算法的有效性。几乎所有现有的开放源数据集都要求只能为学术或教育目的而非商业目的采用这些数据集,而仍然没有保护这些数据集的好办法。在本文件中,我们提议了一种后门嵌入基于数据库的水标记方法,以保护开放源图像分类数据集,通过核查该数据集是否用于培训第三方模型。具体地说,拟议方法包含两个主要程序,包括数据集水标记和数据集核查。我们采用了传统的基于中毒的后门攻击(例如,BadNets),用于标注数据的学术或教育目的,而不是商业目的,而现在仍然没有任何保护这些数据集的好办法。我们建议采用一种后门嵌嵌嵌嵌嵌嵌嵌嵌入数据库的方法,用预先界定的目标类标定的标签。根据拟议的后门水标记,我们使用一种假设测试方法,用以根据可疑的模型对数据库进行数据定位。根据模型,用可靠的基准进行数据采集。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

行人跟踪算法及应用综述*

专知会员服务

21+阅读 · 2020年9月8日