Large organizations such as social media companies continually release data, for example user images. At the same time, these organizations leverage their massive corpora of released data to train proprietary models that give them an edge over their competitors. These two behaviors can be in conflict as an organization wants to prevent competitors from using their own data to replicate the performance of their proprietary models. We solve this problem by developing a data poisoning method by which publicly released data can be minimally modified to prevent others from train-ing models on it. Moreover, our method can be used in an online fashion so that companies can protect their data in real time as they release it.We demonstrate the success of our approach onImageNet classification and on facial recognition.
翻译:社交媒体公司等大型组织不断发布数据,例如用户图像。 与此同时,这些组织利用它们大量发布的数据公司来培训专有模型,从而使它们比竞争者拥有优势。这两种行为可能发生冲突,因为一个组织希望防止竞争者利用自己的数据复制其专有模型的性能。我们通过开发一种数据中毒方法来解决这一问题,通过这种方法可以对公开发布的数据进行最低限度的修改,防止他人为此培训模型。此外,我们的方法可以在线使用,以便公司在发布数据时能够实时保护数据。我们展示了我们在ImagageNet分类和面部识别方法上的成功。