The same method that creates adversarial examples (AEs) to fool image-classifiers can be used to generate counterfactual explanations (CEs) that explain algorithmic decisions. This observation has led researchers to consider CEs as AEs by another name. We argue that the relationship to the true label and the tolerance with respect to proximity are two properties that formally distinguish CEs and AEs. Based on these arguments, we introduce CEs, AEs, and related concepts mathematically in a common framework. Furthermore, we show connections between current methods for generating CEs and AEs, and estimate that the fields will merge more and more as the number of common use-cases grows.
翻译:创建对抗性例子(AEs)以愚弄图像分类的相同方法可用于产生反事实解释,解释算法决定。这一观察促使研究人员将CEs视为另一个名字的AE。我们争辩说,与真实标签的关系和相近容忍度是正式区分CEs和AEs的两个属性。基于这些论点,我们从数学角度在一个共同框架中引入了CEs、AEs和相关概念。此外,我们显示了当前产生CEs和AEs的方法之间的联系,并估计随着常见使用案例数量的增加,这些字段将越来越融为一体。