Tree ensembles are powerful models that are widely used. However, they are susceptible to adversarial examples, which are examples that purposely constructed to elicit a misprediction from the model. This can degrade performance and erode a user's trust in the model. Typically, approaches try to alleviate this problem by verifying how robust a learned ensemble is or robustifying the learning process. We take an alternative approach and attempt to detect adversarial examples in a post-deployment setting. We present a novel method for this task that works by analyzing an unseen example's output configuration, which is the set of predictions made by an ensemble's constituent trees. Our approach works with any additive tree ensemble and does not require training a separate model. We evaluate our approach on three different tree ensemble learners. We empirically show that our method is currently the best adversarial detection method for tree ensembles.
翻译:树群是被广泛使用的强大模型。 但是,树群很容易受到对抗性实例的影响, 它们是有意用来从模型中引起误解的例子。 这会降低性能, 侵蚀用户对模型的信任。 一般来说, 方法试图通过核查一个学到的集合体有多强大或巩固学习过程来缓解这一问题。 我们采取了另一种办法, 并试图在部署后环境中发现对抗性实例。 我们为这项任务提出了一个新颖的方法, 分析一个看不见的示例的输出配置, 这是一种由共同组成树组成的预测集。 我们的方法与任何添加树群一起工作, 不需要训练一个单独的模型。 我们评估了我们对于三个不同的树群学习者的方法。 我们的经验表明, 我们的方法是目前对树群进行对抗性探测的最佳方法。