Natural language processing (NLP) tasks, ranging from text classification to text generation, have been revolutionised by the pre-trained language models, such as BERT. This allows corporations to easily build powerful APIs by encapsulating fine-tuned BERT models for downstream tasks. However, when a fine-tuned BERT model is deployed as a service, it may suffer from different attacks launched by malicious users. In this work, we first present how an adversary can steal a BERT-based API service (the victim/target model) on multiple benchmark datasets with limited prior knowledge and queries. We further show that the extracted model can lead to highly transferable adversarial attacks against the victim model. Our studies indicate that the potential vulnerabilities of BERT-based API services still hold, even when there is an architectural mismatch between the victim model and the attack model. Finally, we investigate two defence strategies to protect the victim model and find that unless the performance of the victim model is sacrificed, both model ex-traction and adversarial transferability can effectively compromise the target models
翻译:自然语言处理(NLP)任务,从文本分类到文本生成等,都因诸如BERET等经过预先培训的语言模式而革命化了,使自然语言处理(NLP)任务,从文本分类到文本生成等,都受到诸如BERT等经过事先培训的语言模式的革命性改造。这样,公司就可以通过为下游任务包装经过微调的BERT模型,很容易地建立强大的APPS。然而,当一个经过微调的BERT模型作为一种服务部署时,它可能遭受恶意用户发动的不同攻击。在这项工作中,我们首先介绍对手如何在多个基准数据集上偷用基于BERT的API服务(受害者/目标模型),而事先知识和查询有限。我们进一步表明,所提取的模型和对抗性转移能力都可能导致对受害者模式的高度可转移的对抗性攻击。我们的研究表明,即使受害者模型与攻击模式在建筑上出现不匹配时,基于BERT的APE服务的潜在脆弱性仍然存在。最后,我们调查两种保护受害者模式的防御战略,并发现除非牺牲受害者模式的性,否则会损害受害者模式的外选和对抗性转移能力会有效损害目标模式。