Out-of-distribution (OOD) detection plays a crucial role in ensuring the safe deployment of deep neural network (DNN) classifiers. While a myriad of methods have focused on improving the performance of OOD detectors, a critical gap remains in interpreting their decisions. We help bridge this gap by providing explanations for OOD detectors based on learned high-level concepts. We first propose two new metrics for assessing the effectiveness of a particular set of concepts for explaining OOD detectors: 1) detection completeness, which quantifies the sufficiency of concepts for explaining an OOD-detector's decisions, and 2) concept separability, which captures the distributional separation between in-distribution and OOD data in the concept space. Based on these metrics, we propose a framework for learning a set of concepts that satisfy the desired properties of detection completeness and concept separability and demonstrate the framework's effectiveness in providing concept-based explanations for diverse OOD techniques. We also show how to identify prominent concepts that contribute to the detection results via a modified Shapley value-based importance score.
翻译:在确保安全部署深神经网络分类器方面,外部分配探测(OOD)在确保安全部署深海神经网络(DNN)探测方面发挥着关键作用。虽然许多方法都侧重于改善OOD探测器的性能,但在解释其决定方面仍存在重大差距。我们根据所学的高层次概念为OOD探测器提供解释,以帮助弥补这一差距。我们首先提出两个新的衡量标准,用于评估解释OOD探测器特定概念的有效性:1)检测完整性,它量化解释OOOD探测器决定的概念的充分性;和2)概念分离性,它捕捉到概念空间中分配和OOOD数据的分配分离。根据这些衡量标准,我们提出一个框架,用以学习一套概念,满足探测完整性和概念可分离的预期特性,并展示框架在为不同OD探测器技术提供基于概念的解释方面的有效性。我们还表明如何通过修改的Sqpley价值分分数确定有助于检测结果的突出概念。