Adversarial training instances can severely distort a model's behavior. This work investigates certified regression defenses, which provide guaranteed limits on how much a regressor's prediction may change under a training-set attack. Our key insight is that certified regression reduces to certified classification when using median as a model's primary decision function. Coupling our reduction with existing certified classifiers, we propose six new provably-robust regressors. To the extent of our knowledge, this is the first work that certifies the robustness of individual regression predictions without any assumptions about the data distribution and model architecture. We also show that existing state-of-the-art certified classifiers often make overly-pessimistic assumptions that can degrade their provable guarantees. We introduce a tighter analysis of model robustness, which in many cases results in significantly improved certified guarantees. Lastly, we empirically demonstrate our approaches' effectiveness on both regression and classification data, where the accuracy of up to 50% of test predictions can be guaranteed under 1% training-set corruption and up to 30% of predictions under 4% corruption. Our source code is available at https://github.com/ZaydH/certified-regression.
翻译:Adversarial 培训实例可能严重扭曲模型的行为。 这项工作调查了经认证的回归防御, 它为在训练式攻击下回归者的预测可能发生多大变化提供了有保证的限度。 我们的关键洞察力是,在使用中位值作为模型的主要决定功能时,经认证的回归会降为经认证的分类。 我们与现有的经认证的分类者一道,提出了6个新的可证实的粗压回归者。 在我们的知识范围内,这是第一次证明个人回归预测的稳健性,而没有数据分布和模型结构的任何假设。 我们还表明,现有的先进认证的分类者往往会作出过于悲观的假设,从而降低其可验证的保证。 我们对模型稳健性进行了更严格的分析,这在许多情况下导致大大改进了经认证的保证。 最后,我们用经验证明了我们在回归和分类数据方面的做法的有效性,在1%的培训性腐败和4 %腐败情况下可以保证高达50%的测试预测的准确性。 我们的源代码可以在 https://Hivath/Zcomrevition。