Model stealing attacks present a dilemma for public machine learning APIs. To protect financial investments, companies may be forced to withhold important information about their models that could facilitate theft, including uncertainty estimates and prediction explanations. This compromise is harmful not only to users but also to external transparency. Model stealing defenses seek to resolve this dilemma by making models harder to steal while preserving utility for benign users. However, existing defenses have poor performance in practice, either requiring enormous computational overheads or severe utility trade-offs. To meet these challenges, we present a new approach to model stealing defenses called gradient redirection. At the core of our approach is a provably optimal, efficient algorithm for steering an adversary's training updates in a targeted manner. Combined with improvements to surrogate networks and a novel coordinated defense strategy, our gradient redirection defense, called GRAD${}^2$, achieves small utility trade-offs and low computational overhead, outperforming the best prior defenses. Moreover, we demonstrate how gradient redirection enables reprogramming the adversary with arbitrary behavior, which we hope will foster work on new avenues of defense.
翻译:模型盗窃攻击是公共机器学习API的两难处境。 为保护金融投资,公司可能被迫隐瞒有助于盗窃的模型的重要信息,包括不确定性估计和预测解释。 这种妥协不仅对用户有害,而且对外部透明度有害。 模型盗窃国防试图通过使模型更难偷窃而同时保护良性用户的效用来解决这一难题。 但是,现有的防御在实践中表现不佳,要么需要巨大的计算间接费用,要么需要严重的公用事业权衡。 为了应对这些挑战,我们提出了一种新的方法来模拟盗窃称为梯度重定向的防御。 我们的方法的核心是,一种可以实现最佳和高效的算法,以有针对性的方式指导对手培训更新。 再加上对代用网络的改进和新的协调防御战略,我们的梯度重置防御,称为GRAD$$2$,实现小的效用交换和低计算费,超过了先前的最佳防御。 此外,我们展示了梯度重定位如何使对手能够以任意行为重新制定方案,我们希望这将促进新的防御途径的工作。