Information leakage is becoming a critical problem as various information becomes publicly available by mistake, and machine learning models train on that data to provide services. As a result, one's private information could easily be memorized by such trained models. Unfortunately, deleting information is out of the question as the data is already exposed to the Web or third-party platforms. Moreover, we cannot necessarily control the labeling process and the model trainings by other parties either. In this setting, we study the problem of targeted disinformation where the goal is to lower the accuracy of inference attacks on a specific target (e.g., a person's profile) only using data insertion. While our problem is related to data privacy and defenses against exploratory attacks, our techniques are inspired by targeted data poisoning attacks with some key differences. We show that our problem is best solved by finding the closest points to the target in the input space that will be labeled as a different class. Since we do not control the labeling process, we instead conservatively estimate the labels probabilistically by combining decision boundaries of multiple classifiers using data programming techniques. We also propose techniques for making the disinformation realistic. Our experiments show that a probabilistic decision boundary can be a good proxy for labelers, and that our approach outperforms other targeted poisoning methods when using end-to-end training on real datasets.
翻译:信息泄漏正在成为一个严重问题,因为各种信息会因错误而公开提供,而机器学习模型则对这些数据进行培训以提供服务。结果,一个人的私人信息很容易被这种经过训练的模式所记住。 不幸的是,删除信息的问题已经不在问题之列,因为数据已经暴露在网络平台或第三方平台上。此外,我们不能必然控制标签程序和其他各方的模型培训。在这个环境中,我们研究有针对性的虚假信息问题,目的是降低对特定目标(例如,一个人的概况)的推断攻击的准确性,而目标只是使用数据插入。我们的问题与数据隐私和防范探索性攻击有关,但我们的技术却受到有目标的数据中毒攻击和一些关键差异的启发。我们表明,我们的问题最好通过找到输入空间中被贴上不同等级的目标最接近的点来解决。由于我们不控制标签程序,我们比较保守地估计标签的准确性,方法是使用数据编目技术将多个分类者的决策界限合并在一起。我们还提议在使用不精确的标签时,采用精确性的方法来进行不精确性的分析。 我们的实验表明,我们用有目标的数据分析方法,在使用精确的标签时,我们的分析方法可以用来作出不精确性。