In the text processing context, most ML models are built on word embeddings. These embeddings are themselves trained on some datasets, potentially containing sensitive data. In some cases this training is done independently, in other cases, it occurs as part of training a larger, task-specific model. In either case, it is of interest to consider membership inference attacks based on the embedding layer as a way of understanding sensitive information leakage. But, somewhat surprisingly, membership inference attacks on word embeddings and their effect in other natural language processing (NLP) tasks that use these embeddings, have remained relatively unexplored. In this work, we show that word embeddings are vulnerable to black-box membership inference attacks under realistic assumptions. Furthermore, we show that this leakage persists through two other major NLP applications: classification and text-generation, even when the embedding layer is not exposed to the attacker. We show that our MI attack achieves high attack accuracy against a classifier model and an LSTM-based language model. Indeed, our attack is a cheaper membership inference attack on text-generative models, which does not require the knowledge of the target model or any expensive training of text-generative models as shadow models.
翻译:在文本处理背景下,大多数 ML 模型都是建立在文字嵌入上。 这些嵌入模型本身是在某些数据集上接受培训的,可能包含敏感数据。 在有些情况下,这种培训是独立进行的,在另一些情况下,这种培训是作为一个更大的任务特定模型的培训的一部分。在这两种情况下,都有必要考虑基于嵌入层的会籍推论攻击作为理解敏感信息泄漏的一种方式。但是,有些有点令人惊讶的是,在使用这些嵌入器的其他自然语言处理(NLP)任务中,对单词嵌入及其效果的会籍推论攻击仍然相对没有被探索。在这项工作中,我们表明,在现实假设下,嵌入的字很容易受到黑盒会籍推断攻击。此外,我们表明,这种渗漏通过另外两项主要的 NLP 应用程序: 分类和文本生成, 即使嵌入层没有暴露在攻击者身上。 我们显示,我们的 MI 攻击对于使用分类模型和基于LSTM 语言模型的其他自然处理中, 仍然相对来说是比较精确的攻击性的攻击性攻击。 事实上, 我们的攻击是一种较廉价的模型攻击, 是对文本模型或甚昂贵的模拟模型的模型, 不需要任何影子模型的模型的培训。