Named entity recognition models (NER), are widely used for identifying named entities (e.g., individuals, locations, and other information) in text documents. Machine learning based NER models are increasingly being applied in privacy-sensitive applications that need automatic and scalable identification of sensitive information to redact text for data sharing. In this paper, we study the setting when NER models are available as a black-box service for identifying sensitive information in user documents and show that these models are vulnerable to membership inference on their training datasets. With updated pre-trained NER models from spaCy, we demonstrate two distinct membership attacks on these models. Our first attack capitalizes on unintended memorization in the NER's underlying neural network, a phenomenon NNs are known to be vulnerable to. Our second attack leverages a timing side-channel to target NER models that maintain vocabularies constructed from the training data. We show that different functional paths of words within the training dataset in contrast to words not previously seen have measurable differences in execution time. Revealing membership status of training samples has clear privacy implications, e.g., in text redaction, sensitive words or phrases to be found and removed, are at risk of being detected in the training dataset. Our experimental evaluation includes the redaction of both password and health data, presenting both security risks and privacy/regulatory issues. This is exacerbated by results that show memorization with only a single phrase. We achieved 70% AUC in our first attack on a text redaction use-case. We also show overwhelming success in the timing attack with 99.23% AUC. Finally we discuss potential mitigation approaches to realize the safe use of NER models in light of the privacy and security implications of membership inference attacks.
翻译:命名实体识别模型 (NER) 被广泛用于在文本文件中识别指定实体(例如个人、地点和其他信息) 。 以机器学习为基础的 NER 模型正越来越多地应用于隐私敏感应用中,这些应用需要自动和可扩缩识别敏感信息,以编辑数据共享文本。 在本文中,我们研究当NER模型作为黑盒服务提供时设置,用于识别用户文件中的敏感信息,并显示这些模型容易在培训数据集上被成员推断。随着经过更新的事先培训的NER模型来自垃圾邮件,我们对这些模型进行了两种不同的成员攻击。我们第一次攻击利用了NER基本神经网络的意外记忆化应用,一种现象是众所周知的。我们的第二次攻击利用了目标NER模型的时序边,以维持从培训数据中构建的语音图解。我们显示培训数据集中的不同功能路径与以前没有看到的语言在实施攻击时的可测量差异。 将培训文本样本的转换状态先是明确的隐私攻击影响, e. g. sentremovement in sure develop laveal deal develop ex develop laction a ex develop ex develop ex development the we.