In this paper, we raise up an emerging personal data protection problem where user personal data (e.g. images) could be inappropriately exploited to train deep neural network models without authorization. To solve this problem, we revisit traditional watermarking in advanced machine learning settings. By embedding a watermarking signature using specialized linear color transformation to user images, neural models will be imprinted with such a signature if training data include watermarked images. Then, a third-party verifier can verify potential unauthorized usage by inferring the watermark signature from neural models. We further explore the desired properties of watermarking and signature space for convincing verification. Through extensive experiments, we show empirically that linear color transformation is effective in protecting user's personal images for various realistic settings. To the best of our knowledge, this is the first work to protect users' personal data from unauthorized usage in neural network training.
翻译:在本文中,我们提出了一个新出现的个人数据保护问题,即用户个人数据(例如图像)可能未经授权被不当利用来训练深神经网络模型。为了解决这个问题,我们重新审视了先进机器学习环境中的传统水标记。通过将使用专门的线性色彩转换的水标记标志嵌入用户图像,如果培训数据包括水标记图像,神经模型将印上这样的标志。然后,第三方核查员可以通过从神经模型推断水标记签字(例如图像)来核实潜在的未经授权的使用。我们进一步探索水标记和签字空间的预期特性,以便进行令人信服的核查。通过广泛的实验,我们从经验上表明线性颜色转换对于保护用户个人图像以适应各种现实环境是有效的。据我们所知,这是在神经网络培训中保护用户个人数据不被未经授权使用的第一个工作。