Visual explanations for object detectors are crucial for enhancing their reliability. Object detectors identify and localize instances by assessing multiple visual features collectively. When generating explanations, overlooking these collective influences in detections may lead to missing compositional cues or capturing spurious correlations. However, existing methods typically focus solely on individual pixel contributions, neglecting the collective contribution of multiple pixels. To address this limitation, we propose a game-theoretic method based on Shapley values and interactions to explicitly capture both individual and collective pixel contributions. Our method provides explanations for both bounding box localization and class determination, highlighting regions crucial for detection. Extensive experiments demonstrate that the proposed method identifies important regions more accurately than state-of-the-art methods. The code is available at https://github.com/tttt-0814/VX-CODE
翻译:目标检测器的视觉解释对于提升其可靠性至关重要。目标检测器通过综合评估多个视觉特征来识别和定位实例。在生成解释时,若忽视检测过程中的这些集体性影响,可能导致遗漏组合性线索或捕捉虚假相关性。然而,现有方法通常仅关注单个像素的贡献,忽略了多个像素的集体贡献。为克服这一局限,我们提出一种基于沙普利值与交互作用的博弈论方法,以显式捕获像素的个体贡献与集体贡献。我们的方法为边界框定位与类别判定均提供解释,并突出显示对检测至关重要的区域。大量实验表明,所提方法比现有最先进方法能更准确地识别重要区域。代码发布于 https://github.com/tttt-0814/VX-CODE