REFIT: 具有有限数据的深学习系统统一水标记清除框架 (REFIT: A Unified Watermark Removal Framework For Deep Learning Systems With Limited Data)

from arxiv, ACM Asia Conference on Computer and Communications Security (AsiaCCS), 2021. Early version in ICML 2019 Workshop on Security and Privacy of Machine Learning. The first two authors contribute equally

Training deep neural networks from scratch could be computationally expensive and requires a lot of training data. Recent work has explored different watermarking techniques to protect the pre-trained deep neural networks from potential copyright infringements. However, these techniques could be vulnerable to watermark removal attacks. In this work, we propose REFIT, a unified watermark removal framework based on fine-tuning, which does not rely on the knowledge of the watermarks, and is effective against a wide range of watermarking schemes. In particular, we conduct a comprehensive study of a realistic attack scenario where the adversary has limited training data, which has not been emphasized in prior work on attacks against watermarking schemes. To effectively remove the watermarks without compromising the model functionality under this weak threat model, we propose two techniques that are incorporated into our fine-tuning framework: (1) an adaption of the elastic weight consolidation (EWC) algorithm, which is originally proposed for mitigating the catastrophic forgetting phenomenon; and (2) unlabeled data augmentation (AU), where we leverage auxiliary unlabeled data from other sources. Our extensive evaluation shows the effectiveness of REFIT against diverse watermark embedding schemes. In particular, both EWC and AU significantly decrease the amount of labeled training data needed for effective watermark removal, and the unlabeled data samples used for AU do not necessarily need to be drawn from the same distribution as the benign data for model evaluation. The experimental results demonstrate that our fine-tuning based watermark removal attacks could pose real threats to the copyright of pre-trained models, and thus highlight the importance of further investigating the watermarking problem and proposing more robust watermark embedding schemes against the attacks.

翻译：在这项工作中,我们提议采用一个基于微调的统一水印去除框架,这一框架不依赖水印知识,对广泛的水标记计划有效。特别是,我们全面研究现实攻击情景,即对手的训练数据有限,而以前打击水标记计划的工作没有强调这种数据。为了有效去除水印,同时又不损害这一薄弱威胁模式下的模型功能,我们提议将两种技术纳入我们的微调框架:(1)调整弹性重量整合(ECW)算法,该算法最初是为减轻灾难性的遗忘现象而提出的;和(2)未标注的数据增强(AU),我们利用其他来源的辅助性无标记数据。我们的广泛评估表明,REFIT对不同水标记袭击模式的效用,而不是针对不同水标记的标记计划;因此,我们提出的精度去除模型的精确性定义,需要用EWC数据库的精确度数据。