To interpret deep neural networks, one main approach is to dissect the visual input and find the prototypical parts responsible for the classification. However, existing methods often ignore the hierarchical relationship between these prototypes, and thus can not explain semantic concepts at both higher level (e.g., water sports) and lower level (e.g., swimming). In this paper inspired by human cognition system, we leverage hierarchal information to deal with uncertainty: When we observe water and human activity, but no definitive action it can be recognized as the water sports parent class. Only after observing a person swimming can we definitively refine it to the swimming action. To this end, we propose HIerarchical Prototype Explainer (HIPE) to build hierarchical relations between prototypes and classes. HIPE enables a reasoning process for video action classification by dissecting the input video frames on multiple levels of the class hierarchy, our method is also applicable to other video tasks. The faithfulness of our method is verified by reducing accuracy-explainability trade off on ActivityNet and UCF-101 while providing multi-level explanations.
翻译:视频行为识别的层次化解释
为了解释深度神经网络,目前的方法往往忽略原型之间的层次关系,因此无法对高层语义概念(如水上运动)和低层概念(如游泳)进行解释。本文从人类认知系统的启示出发,针对不确定性利用层次信息进行处理。当我们观察到水和人类活动时,但没有明确的动作可以识别,它可以被识别为水上运动的父类。只有在观察到一个人游泳时,我们才能明确地将其细化为游泳动作。为此,我们提出了一种名为HIPE的Hierarchical Prototype Explainer来建立原型和类之间的层次关系。HIPE通过多层次的类别层次结构对输入视频帧进行分解,实现视频行为分类的推理过程,同时我们的方法也适用于其他视频任务。在ActivityNet和UCF-101上减少了准确性 - 可解释性的权衡,同时提供了多层次的解释证明了我们方法的准确性。