The advances in pre-trained models (e.g., BERT, XLNET and etc) have largely revolutionized the predictive performance of various modern natural language processing tasks. This allows corporations to provide machine learning as a service (MLaaS) by encapsulating fine-tuned BERT-based models as commercial APIs. However, previous works have discovered a series of vulnerabilities in BERT- based APIs. For example, BERT-based APIs are vulnerable to both model extraction attack and adversarial example transferrability attack. However, due to the high capacity of BERT-based APIs, the fine-tuned model is easy to be overlearned, what kind of information can be leaked from the extracted model remains unknown and is lacking. To bridge this gap, in this work, we first present an effective model extraction attack, where the adversary can practically steal a BERT-based API (the target/victim model) by only querying a limited number of queries. We further develop an effective attribute inference attack to expose the sensitive attribute of the training data used by the BERT-based APIs. Our extensive experiments on benchmark datasets under various realistic settings demonstrate the potential vulnerabilities of BERT-based APIs.
翻译:培训前模型(如BERT、XLNET等)的进展在很大程度上使各种现代自然语言处理任务的预测性表现发生了革命性的变化,使公司能够通过将基于BERT的模型包装成商业API而提供机器学习服务(MLAAS),将基于BERT的模型包装成商业API,然而,以前的工作发现基于BERT的API的一系列弱点,例如,基于BERT的API很容易受到模型抽取攻击和对抗性例子转移性攻击的伤害。然而,由于基于BERT的API的API能力很高,微调模型很容易过度学习,从提取的模型中漏出的信息种类仍然未知,而且缺乏。为了缩小这一差距,我们在这项工作中首先提出有效的模型抽取攻击,对手实际上只能通过查询有限数量的查询来窃取基于BERT的API(目标/受害者模型) 。我们进一步建立有效的推论攻击,以揭露基于BERT的API使用的培训数据的敏感属性。