When data is publicly released for human consumption, it is unclear how to prevent its unauthorized usage for machine learning purposes. Successful model training may be preventable with carefully designed dataset modifications, and we present a proof-of-concept approach for the image classification setting. We propose methods based on the notion of adversarial shortcuts, which encourage models to rely on non-robust signals rather than semantic features, and our experiments demonstrate that these measures successfully prevent deep learning models from achieving high accuracy on real, unmodified data examples.
翻译:当数据公开发布供人类消费时,尚不清楚如何防止未经授权用于机器学习目的。 成功的示范培训可以通过精心设计的数据集修改加以预防,我们为图像分类设置提出了一个验证概念的方法。 我们提出基于对抗性捷径概念的方法,鼓励模型依赖非野蛮信号而不是语义特征,我们的实验表明,这些措施成功地阻止了深层学习模型在真实、未经修改的数据实例中实现高度准确性。