Recent work has demonstrated the successful extraction of training data from generative language models. However, it is not evident whether such extraction is feasible in text classification models since the training objective is to predict the class label as opposed to next-word prediction. This poses an interesting challenge and raises an important question regarding the privacy of training data in text classification settings. Therefore, we study the potential privacy leakage in the text classification domain by investigating the problem of unintended memorization of training data that is not pertinent to the learning task. We propose an algorithm to extract missing tokens of a partial text by exploiting the likelihood of the class label provided by the model. We test the effectiveness of our algorithm by inserting canaries into the training set and attempting to extract tokens in these canaries post-training. In our experiments, we demonstrate that successful extraction is possible to some extent. This can also be used as an auditing strategy to assess any potential unauthorized use of personal data without consent.
翻译:最近的工作表明,从基因化语言模型中成功地提取了培训数据,但是,在文本分类模型中,这种提取方法是否可行并不明显,因为培训的目的是预测类类标签,而不是下一个词的预测,这提出了一个有趣的挑战,并提出了在文本分类设置中培训数据隐私的重要问题,因此,我们通过调查与学习任务无关的培训数据意外记忆问题,研究文本分类领域潜在的隐私渗漏问题。我们提出一种算法,利用模型提供的类标签的可能性来提取部分文本缺失的标记。我们测试我们的算法的有效性,在培训成套中插入金丝雀,并试图在这些培训后金丝雀中提取标码。我们实验中显示,在某种程度上成功提取个人数据是可能的。这也可以作为一种审计战略,用以评估任何未经同意擅自使用个人数据的可能性。