Self-supervised learning is an emerging machine learning paradigm. Compared to supervised learning which leverages high-quality labeled datasets, self-supervised learning relies on unlabeled datasets to pre-train powerful encoders which can then be treated as feature extractors for various downstream tasks. The huge amount of data and computational resources consumption makes the encoders themselves become the valuable intellectual property of the model owner. Recent research has shown that the machine learning model's copyright is threatened by model stealing attacks, which aim to train a surrogate model to mimic the behavior of a given model. We empirically show that pre-trained encoders are highly vulnerable to model stealing attacks. However, most of the current efforts of copyright protection algorithms such as watermarking concentrate on classifiers. Meanwhile, the intrinsic challenges of pre-trained encoder's copyright protection remain largely unstudied. We fill the gap by proposing SSLGuard, the first watermarking scheme for pre-trained encoders. Given a clean pre-trained encoder, SSLGuard injects a watermark into it and outputs a watermarked version. The shadow training technique is also applied to preserve the watermark under potential model stealing attacks. Our extensive evaluation shows that SSLGuard is effective in watermark injection and verification, and it is robust against model stealing and other watermark removal attacks such as input noising, output perturbing, overwriting, model pruning, and fine-tuning.
翻译:自我监督的学习是一种新兴的机器学习模式。 与利用高品质标签数据集的监管学习相比, 自我监督的学习依赖于未贴标签的数据集, 以预先培训强大的编码器作为各种下游任务的特点提取器。 大量的数据和计算资源消耗使编码器本身成为模型所有人的宝贵知识产权。 最近的研究表明, 机器学习模型的版权受到模型盗窃袭击的威胁, 目的是训练替代模型模仿某个模型的行为。 我们的经验显示, 预先训练的编码器极易受到模型盗窃攻击。 然而, 目前版权保护算法的努力, 如水标记的精度, 仍然在很大程度上没有被研究。 预训练的编码编码器本身的版权保护本身就成为模型所有人的宝贵知识产权。 我们通过提出SLFGuard模型来填补这一空白, 这是为预先训练的稳健的编码模型, 目的是训练一个清洁的编码模型, 精密的编码的编码, 精密的编码编码器中的编码极易受模式攻击。 但是, 将版权保护算法的多数的算法工作, 也显示, 我们的模型的模型和预设的模型的索引用于 水标记的模型的模型的模型的索引, 也是在水标记下的, 。