Self-supervised models are increasingly prevalent in machine learning (ML) since they reduce the need for expensively labeled data. Because of their versatility in downstream applications, they are increasingly used as a service exposed via public APIs. At the same time, these encoder models are particularly vulnerable to model stealing attacks due to the high dimensionality of vector representations they output. Yet, encoders remain undefended: existing mitigation strategies for stealing attacks focus on supervised learning. We introduce a new dataset inference defense, which uses the private training set of the victim encoder model to attribute its ownership in the event of stealing. The intuition is that the log-likelihood of an encoder's output representations is higher on the victim's training data than on test data if it is stolen from the victim, but not if it is independently trained. We compute this log-likelihood using density estimation models. As part of our evaluation, we also propose measuring the fidelity of stolen encoders and quantifying the effectiveness of the theft detection without involving downstream tasks; instead, we leverage mutual information and distance measurements. Our extensive empirical results in the vision domain demonstrate that dataset inference is a promising direction for defending self-supervised models against model stealing.
翻译:自监督模型在机器学习中越来越普遍,因为它们减少了对昂贵标签数据的需求。由于其在下游应用中的多功能性,它们越来越多地被用作通过公共API暴露的一种服务。同时,这些编码模型由于其输出的矢量表示的高度性而特别容易受到模式盗窃攻击。然而,编码器仍然没有防守:现有的偷窃袭击的减缓战略以监督学习为重点。我们引入了一种新的数据集推断防御,它使用受害者编码器模型的私人培训集来说明其在偷窃情况下的所有权。直觉是,编码器输出显示在受害者培训数据上的日志比在测试数据上要高,如果数据是从受害者那里偷取的,但是如果数据是独立培训的话。我们使用密度估计模型来计算这种日志相似性。作为我们评估的一部分,我们还建议测量被盗编码器的准确性,并在不涉及下游任务的情况下量化盗窃探测的有效性;相反,我们利用相互信息和距离测量的图像显示我们广泛的经验模型的自我定位方向。