In a real-world scenario, human actions are typically out of the distribution from training data, which requires a model to both recognize the known actions and reject the unknown. Different from image data, video actions are more challenging to be recognized in an open-set setting due to the uncertain temporal dynamics and static bias of human actions. In this paper, we propose a Deep Evidential Action Recognition (DEAR) method to recognize actions in an open testing set. Specifically, we formulate the action recognition problem from the evidential deep learning (EDL) perspective and propose a novel model calibration method to regularize the EDL training. Besides, to mitigate the static bias of video representation, we propose a plug-and-play module to debias the learned representation through contrastive learning. Experimental results show that our DEAR method achieves consistent performance gain on multiple mainstream action recognition models and benchmarks. Code and pre-trained models are available at {\small{\url{https://www.rit.edu/actionlab/dear}}}.
翻译:在现实世界中,人类行动一般都是从培训数据中传播出来的,这需要一种既承认已知行动又拒绝未知行动的模式。与图像数据不同,视频行动更难在开放环境中得到承认,因为时间动态不确定和人类行动静态的偏差。在本文中,我们建议了一种深证据行动识别(DEAR)方法,以在开放测试组中识别行动。具体地说,我们从证据深度学习(EDL)的角度来提出行动识别问题,并提出一种新的模型校准方法,以规范EDL培训。此外,为了减少视频代表的静态偏差,我们提议了一个插和播放模块,通过对比性学习来贬低所学的代表性。实验结果显示,我们的DEAR方法在多种主流行动识别模型和基准中取得了持续的业绩收益。代码和预先培训模式可在以下smallyurl{http://www.rit.edu/actionlab/dear ⁇ >查阅。