Two fundamental requirements for the deployment of machine learning models in safety-critical systems are to be able to detect out-of-distribution (OOD) data correctly and to be able to explain the prediction of the model. Although significant effort has gone into both OOD detection and explainable AI, there has been little work on explaining why a model predicts a certain data point is OOD. In this paper, we address this question by introducing the concept of an OOD counterfactual, which is a perturbed data point that iteratively moves between different OOD categories. We propose a method for generating such counterfactuals, investigate its application on synthetic and benchmark data, and compare it to several benchmark methods using a range of metrics.
翻译:在安全临界系统中部署机器学习模型的两个基本要求是,能够正确探测分配外数据,并能够解释对模型的预测。虽然在OOD探测和可解释的AI方面都作出了很大努力,但在解释为什么模型预测某一数据点是OOD方面没有做多少工作。在本文件中,我们通过引入OOOD反事实概念来解决这一问题,这是一个在OOD类别之间反复移动的扰动数据点。我们提出了一种方法,用以产生这种反事实,调查其在合成数据和基准数据方面的应用,并用一系列尺度将其与若干基准方法进行比较。