Language Models (LMs) have been shown to leak information about training data through sentence-level membership inference and reconstruction attacks. Understanding the risk of LMs leaking Personally Identifiable Information (PII) has received less attention, which can be attributed to the false assumption that dataset curation techniques such as scrubbing are sufficient to prevent PII leakage. Scrubbing techniques reduce but do not prevent the risk of PII leakage: in practice scrubbing is imperfect and must balance the trade-off between minimizing disclosure and preserving the utility of the dataset. On the other hand, it is unclear to which extent algorithmic defenses such as differential privacy, designed to guarantee sentence- or user-level privacy, prevent PII disclosure. In this work, we propose (i) a taxonomy of PII leakage in LMs, (ii) metrics to quantify PII leakage, and (iii) attacks showing that PII leakage is a threat in practice. Our taxonomy provides rigorous game-based definitions for PII leakage via black-box extraction, inference, and reconstruction attacks with only API access to an LM. We empirically evaluate attacks against GPT-2 models fine-tuned on three domains: case law, health care, and e-mails. Our main contributions are (i) novel attacks that can extract up to 10 times more PII sequences as existing attacks, (ii) showing that sentence-level differential privacy reduces the risk of PII disclosure but still leaks about 3% of PII sequences, and (iii) a subtle connection between record-level membership inference and PII reconstruction.
翻译:语言模型(LMS)被显示通过判决一级的会籍推断和重组攻击泄露有关培训数据的信息。理解LMS泄露个人识别信息(PII)的风险受到的注意较少,这可以归因于错误的假设,即如洗涤等数据集曲线技术足以防止PII泄漏。擦拭技术减少但不能防止PII渗漏的风险:在实践中洗涤是不完善的,必须平衡尽量减少披露与维护数据集的效用之间的权衡。另一方面,还不清楚为了保证判决或用户一级的隐私,防止披露PII而设计的“LMS”等隐蔽保护系统在什么程度上受到了关注。在这项工作中,我们建议:(一) 将PII渗漏的分类方法足以防止PII渗漏。 (三) 擦拭技术减少但不能防止PII渗漏的风险。 我们的分类仍然为通过黑箱提取、推断和仅利用API的保密性隐私进行重建攻击提供了严格的游戏性定义。 (二) 我们从经验上评估PII级袭击的分类和对GP-2级袭击的顺序,可以减少目前对P级袭击的排序。