Pre-trained language models (PTLMs) have achieved great success and remarkable performance over a wide range of natural language processing (NLP) tasks. However, there are also growing concerns regarding the potential security issues in the adoption of PTLMs. In this survey, we comprehensively systematize recently discovered threats to PTLM systems and applications. We perform our attack characterization from three interesting perspectives. (1) We show threats can occur at different stages of the PTLM pipeline raised by different malicious entities. (2) We identify two types of model transferability (landscape, portrait) that facilitate attacks. (3) Based on the attack goals, we summarize four categories of attacks (backdoor, evasion, data privacy and model privacy). We also discuss some open problems and research directions. We believe our survey and taxonomy will inspire future studies towards secure and privacy-preserving PTLMs.
翻译:培训前语言模式(PTLMs)在各种自然语言处理任务中取得了巨大成功和显著成绩,然而,在采用PTLMs时,对潜在的安全问题也日益引起人们的关切。在这次调查中,我们全面地将最近发现的对PTLM系统和应用的威胁系统化;我们从三个有趣的角度对攻击性语言模式进行定性。 (1) 显示不同恶意实体在PTLM输油管的不同阶段可能会出现威胁。 (2) 我们查明两种类型的模式可转移性(地貌、肖像),为攻击提供便利。 (3) 根据攻击目标,我们总结了四种类型的攻击(后门、逃、数据隐私和示范隐私)。我们还讨论了一些公开的问题和研究方向。我们认为,我们的调查和分类将鼓励今后对安全和保护隐私的PTLMs进行研究。