Self-supervised representation learning techniques have been developing rapidly to make full use of unlabeled images. They encode images into rich features that are oblivious to downstream tasks. Behind their revolutionary representation power, the requirements for dedicated model designs and a massive amount of computation resources expose image encoders to the risks of potential model stealing attacks - a cheap way to mimic the well-trained encoder performance while circumventing the demanding requirements. Yet conventional attacks only target supervised classifiers given their predicted labels and/or posteriors, which leaves the vulnerability of unsupervised encoders unexplored. In this paper, we first instantiate the conventional stealing attacks against encoders and demonstrate their severer vulnerability compared with downstream classifiers. To better leverage the rich representation of encoders, we further propose Cont-Steal, a contrastive-learning-based attack, and validate its improved stealing effectiveness in various experiment settings. As a takeaway, we appeal to our community's attention to the intellectual property protection of representation learning techniques, especially to the defenses against encoder stealing attacks like ours.
翻译:自监督表示学习技术正在快速发展,以充分利用无标签图像。它们将图像编码为对下游任务毫不在意的丰富特征。在其革命性表示力背后,专用模型设计和大量计算资源的要求使图像编码器面临潜在模型盗窃攻击的风险-一种模仿训练良好的编码器性能的廉价方式,同时规避繁重的要求。然而,传统攻击仅针对有监督分类器,而不考虑其预测标签和/或后验概率,这使得无监督编码器的脆弱性未被探索。在本文中,我们首先实例化传统的编码器盗窃攻击,并证明它们相对于下游分类器的脆弱性更严重。为了更好地利用编码器的丰富表示,我们进一步提出了Cont-Steal,一种基于对比学习的攻击,并验证了其在各种实验设置中的改进的盗窃有效性。最后,我们呼吁我们的社区关注表示学习技术的知识产权保护,特别是防御像我们的编码器窃取攻击的防御策略。