In model extraction attacks, adversaries can steal a machine learning model exposed via a public API by repeatedly querying it and adjusting their own model based on obtained predictions. To prevent model stealing, existing defenses focus on detecting malicious queries, truncating, or distorting outputs, thus necessarily introducing a tradeoff between robustness and model utility for legitimate users. Instead, we propose to impede model extraction by requiring users to complete a proof-of-work before they can read the model's predictions. This deters attackers by greatly increasing (even up to 100x) the computational effort needed to leverage query access for model extraction. Since we calibrate the effort required to complete the proof-of-work to each query, this only introduces a slight overhead for regular users (up to 2x). To achieve this, our calibration applies tools from differential privacy to measure the information revealed by a query. Our method requires no modification of the victim model and can be applied by machine learning practitioners to guard their publicly exposed models against being easily stolen.
翻译:在模型抽取攻击中,对手可以偷取通过公开的API暴露的机器学习模型,反复询问该模型,并根据获得的预测调整自己的模型。为了防止模式盗窃,现有的防御侧重于检测恶意查询、缩短或扭曲输出,从而必然在稳健性和模型对合法用户的实用性之间进行权衡。相反,我们提议通过要求用户在阅读模型预测之前完成一项工作证明来阻止模型提取。这通过大大增加(甚至高达100x)利用查询访问进行模型提取所需的计算努力来吓阻攻击者。由于我们调整了为每次查询完成工作证明所需的努力,这只给经常用户带来轻微的间接负担(高达2x)。为了达到这一点,我们的校准运用了不同隐私的工具来衡量查询所披露的信息。我们的方法不需要修改受害者模型,并且可以由机器学习实践者用来保护其公开暴露的模式不被轻易被盗。