The volume of "free" data on the internet has been key to the current success of deep learning. However, it also raises privacy concerns about the unauthorized exploitation of personal data for training commercial models. It is thus crucial to develop methods to prevent unauthorized data exploitation. This paper raises the question: \emph{can data be made unlearnable for deep learning models?} We present a type of \emph{error-minimizing} noise that can indeed make training examples unlearnable. Error-minimizing noise is intentionally generated to reduce the error of one or more of the training example(s) close to zero, which can trick the model into believing there is "nothing" to learn from these example(s). The noise is restricted to be imperceptible to human eyes, and thus does not affect normal data utility. We empirically verify the effectiveness of error-minimizing noise in both sample-wise and class-wise forms. We also demonstrate its flexibility under extensive experimental settings and practicability in a case study of face recognition. Our work establishes an important first step towards making personal data unexploitable to deep learning models.
翻译:互联网上的“ 免费” 数据量是当前深层学习成功的关键。 但是,它也引起了对未经授权利用个人数据进行商业模型培训的隐私关切。 因此,开发防止未经授权数据开发的方法至关重要。 本文提出了一个问题 : \ emph{ can data 成为深层学习模型不可忽略的数据? } 我们展示了一种类型的 \ emph{ error-minommizing} 噪音, 它确实可以使培训实例不可忽略。 故意生成错误最小化噪音, 以减少一个或一个以上接近零的培训示例的错误, 从而可以使模型相信“ 没有什么” 可以从这些示例中学习。 噪音被限制在人类眼睛上是无法察觉的, 因而不影响正常的数据效用 。 我们从经验上核查了在样本和类中最小化的噪音的有效性。 我们还在广泛的实验环境中展示了它的灵活性,并在面对面的案例研究中显示其实用性。 我们的工作确立了使个人数据无法探索模型成为深层学习模型的重要的第一步。