Self-supervised learning is an emerging machine learning (ML) paradigm. Compared to supervised learning which leverages high-quality labeled datasets to achieve good performance, self-supervised learning relies on unlabeled datasets to pre-train powerful encoders which can then be treated as feature extractors for various downstream tasks. The huge amount of data and computational resources consumption makes the encoders themselves become valuable intellectual property of the model owner. Recent research has shown that the ML model's copyright is threatened by model stealing attacks, which aim to train a surrogate model to mimic the behavior of a given model. We empirically show that pre-trained encoders are highly vulnerable to model stealing attacks. However, most of the current efforts of copyright protection algorithms such as watermarking concentrate on classifiers. Meanwhile, the intrinsic challenges of pre-trained encoder's copyright protection remain largely unstudied. We fill the gap by proposing SSLGuard, the first watermarking algorithm for pre-trained encoders. Given a clean pre-trained encoder, SSLGuard injects a watermark into it and outputs a watermarked version. The shadow training technique is also applied to preserve the watermark under potential model stealing attacks. Our extensive evaluation shows that SSLGuard is effective in watermark injection and verification, and is robust against model stealing and other watermark removal attacks such as input noising, output perturbing, overwriting, model pruning, and fine-tuning.
翻译:自我监督的学习是一种新兴的机器学习(ML)范式。 与利用高品质标签数据集实现良好业绩的监督学习相比, 自监督的学习依赖于未贴标签的数据集,对强大的编码器进行预培训,然后可以将其作为各种下游任务的特性提取器。 大量的数据和计算资源消耗使得编码器本身成为模型拥有者的宝贵知识产权。 最近的研究显示,ML模型的版权受到模型盗窃袭击的威胁,该模型旨在训练一个替代模型以模拟特定模型的行为。 我们的经验显示,预先训练的编码器极易受到模型盗窃攻击的伤害。 然而,目前版权保护算法的多数努力,例如将水标记集中用于分类。 与此同时,预先训练的编码器的版权保护的内在挑战仍然在很大程度上没有受到研究。 我们通过提出模型SLSLGuard,这是为预先训练的编码首次精确的编码算法,目的是训练一个模拟精练前精细的模型, SSLGuard的编码器极易易受攻击。 将一个有效的水标记, 将一个有效的模型标记用于在水上标记中, 将一个有效的模型中进行。