Information leakage is becoming a critical problem as various information becomes publicly available by mistake, and machine learning models train on that data to provide services. As a result, one's private information could easily be memorized by such trained models. Unfortunately, deleting information is out of the question as the data is already exposed to the Web or third-party platforms. Moreover, we cannot necessarily control the labeling process and the model trainings by other parties either. In this setting, we study the problem of targeted disinformation generation where the goal is to dilute the data and thus make a model safer and more robust against inference attacks on a specific target (e.g., a person's profile) by only inserting new data. Our method finds the closest points to the target in the input space that will be labeled as a different class. Since we cannot control the labeling process, we instead conservatively estimate the labels probabilistically by combining decision boundaries of multiple classifiers using data programming techniques. Our experiments show that a probabilistic decision boundary can be a good proxy for labelers, and that our approach is effective in defending against inference attacks and can scale to large data.
翻译:信息泄漏正在成为一个严重问题,因为各种信息会因错误而公开提供,而机器学习模型则对这些数据进行培训以提供服务。结果,一个人的私人信息很容易被这种经过训练的模式所记住。 不幸的是,删除信息的问题已经不在问题之列,因为数据已经暴露在网络平台或第三方平台上。此外,我们不能必然控制标签过程和其他各方的模型培训。在这个环境中,我们研究有针对性地生成虚假信息的问题,目的是冲淡数据,从而通过插入新数据,使模型更安全、更有力地防止对特定目标(例如一个人的剖面图)的推断攻击。我们的方法在输入空间找到与目标最接近的点,而输入空间将被标记为不同的类别。由于我们无法控制标签过程,我们反而保守地估计标签的概率,方法是利用数据编程技术将多个分类者的决定界限合并在一起。我们的实验表明,概率决定边界可以成为标签者的良好代用,我们的方法是有效防止误判攻击,并且能够将大数据标为尺度。