Many recent works have proposed methods to train classifiers with local robustness properties, which can provably eliminate classes of evasion attacks for most inputs, but not all inputs. Since data distribution shift is very common in security applications, e.g., often observed for malware detection, local robustness cannot guarantee that the property holds for unseen inputs at the time of deploying the classifier. Therefore, it is more desirable to enforce global robustness properties that hold for all inputs, which is strictly stronger than local robustness. In this paper, we present a framework and tools for training classifiers that satisfy global robustness properties. We define new notions of global robustness that are more suitable for security classifiers. We design a novel booster-fixer training framework to enforce global robustness properties. We structure our classifier as an ensemble of logic rules and design a new verifier to verify the properties. In our training algorithm, the booster increases the classifier's capacity, and the fixer enforces verified global robustness properties following counterexample guided inductive synthesis. We show that we can train classifiers to satisfy different global robustness properties for three security datasets, and even multiple properties at the same time, with modest impact on the classifier's performance. For example, we train a Twitter spam account classifier to satisfy five global robustness properties, with 5.4% decrease in true positive rate, and 0.1% increase in false positive rate, compared to a baseline XGBoost model that doesn't satisfy any property.
翻译:最近许多工作都提出了培训具有本地稳健性特性的分类师的方法,这些方法可以明显地消除大多数投入(但不是所有投入)的规避攻击类别。由于数据分布转换在安全应用程序中非常常见,例如,经常观测到的恶意软件检测,因此本地稳健性不能保证在部署分类器时,财产保留在无形投入中。因此,更可取的是执行所有投入所持有的全球稳健性特性,这比地方稳健性强得多。在本文件中,我们提出了一个框架和工具,用于培训满足全球稳健性特性的分类员。我们定义了全球稳健性新概念,更适合安全分类师。我们设计了一个新型的增强剂-更新剂培训框架,以强制全球稳健性特性。我们把我们的分类器构建成逻辑规则的组合,并设计一个新的校正性文件。在我们的培训算中,增强分类器的能力,在反演化合成后,改进了全球稳健性特性。我们显示,我们可以在三个全球稳健性基线值中培训分类者满足不同的全球稳健性基线性特性,在三个安全性模型中,提高性能性能性能的精确性能。