Self-supervised learning has achieved revolutionary progress in the past several years and is commonly believed to be a promising approach for general-purpose AI. In particular, self-supervised learning aims to pre-train an encoder using a large amount of unlabeled data. The pre-trained encoder is like an "operating system" of the AI ecosystem. Specifically, the encoder can be used as a feature extractor for many downstream tasks with little or no labeled training data. Existing studies on self-supervised learning mainly focused on pre-training a better encoder to improve its performance on downstream tasks in non-adversarial settings, leaving its security and privacy in adversarial settings largely unexplored. A security or privacy issue of a pre-trained encoder leads to a single point of failure for the AI ecosystem. In this book chapter, we discuss 10 basic security and privacy problems for the pre-trained encoders in self-supervised learning, including six confidentiality problems, three integrity problems, and one availability problem. For each problem, we discuss potential opportunities and challenges. We hope our book chapter will inspire future research on the security and privacy of self-supervised learning.
翻译:在过去几年里,自我监督的学习取得了革命性的进展,并被普遍认为是通用AI的一个很有希望的方法。特别是,自我监督的学习旨在预先培训使用大量未贴标签数据的编码器。预先培训的编码器类似于AI生态系统的“操作系统”。具体地说,编码器可以用作许多下游任务的特点提取器,只有很少或没有标签的培训数据。现有的自我监督的学习研究主要侧重于培训一个更好的编码器,以提高其在非对抗环境下下游任务的业绩,将其安全和隐私留在对抗性环境下基本上没有被排除。一个预先培训的编码器的安全或隐私问题导致AI生态系统的单一故障点。在本书章中,我们讨论预先培训的编码器在自我监督学习中的10个基本安全和隐私问题,包括6个保密问题、3个完整性问题和1个可用问题。关于每一个问题,我们讨论潜在的机会和挑战。我们希望我们的书章将激励未来对安全和隐私进行自我监督的研究。