Large Pre-Trained Language Models (PLMs) have facilitated and dominated many NLP tasks in recent years. However, despite the great success of PLMs, there are also privacy concerns brought with PLMs. For example, recent studies show that PLMs memorize a lot of training data, including sensitive information, while the information may be leaked unintentionally and be utilized by malicious attackers. In this paper, we propose to measure whether PLMs are prone to leaking personal information. Specifically, we attempt to query PLMs for email addresses with contexts of the email address or prompts containing the owner's name. We find that PLMs do leak personal information due to memorization. However, the risk of specific personal information being extracted by attackers is low because the models are weak at associating the personal information with its owner. We hope this work could help the community to better understand the privacy risk of PLMs and bring new insights to make PLMs safe.
翻译:近年来,大型培训前语言模型(PLM)帮助并主导了许多NLP任务。然而,尽管PLM取得了巨大成功,但也有对PLM的隐私问题。例如,最近的研究表明,PLM对许多培训数据进行记忆,包括敏感信息,而信息可能被无意泄露,恶意攻击者会加以利用。在本文中,我们提议衡量PLMs是否容易泄露个人信息。具体地说,我们试图在电子邮件地址或载有所有者姓名的提示中查询PLM地址的电子邮件地址。我们发现,由于记忆化,PLMS确实泄漏个人信息。但是,由于模型在将个人信息与其所有者联系起来方面软弱无力,个人信息被攻击者提取的风险很小。我们希望这项工作能够帮助社区更好地了解PLM的隐私风险,并带来新的洞察力,使PLMS安全。