In still image human action recognition, existing studies have mainly leveraged extra bounding box information along with class labels to mitigate the lack of temporal information in still images; however, preparing extra data with manual annotation is time-consuming and also prone to human errors. Moreover, the existing studies have not addressed action recognition with long-tailed distribution. In this paper, we propose a two-phase multi-expert classification method for human action recognition to cope with long-tailed distribution by means of super-class learning and without any extra information. To choose the best configuration for each super-class and characterize inter-class dependency between different action classes, we propose a novel Graph-Based Class Selection (GCS) algorithm. In the proposed approach, a coarse-grained phase selects the most relevant fine-grained experts. Then, the fine-grained experts encode the intricate details within each super-class so that the inter-class variation increases. Extensive experimental evaluations are conducted on various public human action recognition datasets, including Stanford40, Pascal VOC 2012 Action, BU101+, and IHAR datasets. The experimental results demonstrate that the proposed method yields promising improvements. To be more specific, in IHAR, Sanford40, Pascal VOC 2012 Action, and BU101+ benchmarks, the proposed approach outperforms the state-of-the-art studies by 8.92%, 0.41%, 0.66%, and 2.11 % with much less computational cost and without any auxiliary annotation information. Besides, it is proven that in addressing action recognition with long-tailed distribution, the proposed method outperforms its counterparts by a significant margin.
翻译:在仍以图像显示人类行动识别中,现有研究主要利用了额外捆绑框信息以及类标签,以减少在静止图像中缺乏时间信息的情况;然而,用人工批注编制额外数据耗费时间,而且容易出现人为错误。此外,现有研究没有以长尾分发方式解决行动识别问题。在本文件中,我们建议了人类行动识别的两阶段多专业分类方法,以便通过超级学习和没有任何额外信息来应对长期尾声分布。为了选择每个超级类的最佳配置,并描述不同行动类别之间的跨级依赖性,我们建议了一个新的基于图表的类选择(GCS)算法(GCS) 。在拟议方法中,一个粗略的分类化阶段选择了最相关的精细细细的派专家。然后,我们提出了一种精细化的多层次人类行动分类方法,以便通过超级类学习和不增加任何额外信息。 广泛实验评估了各种公众行动识别数据集,包括斯坦福40,2012年Pascal VOC行动, BU101+, 和IHAR 数据集, 实验性结果显示其大幅改进了2012年具体方法, 和2012年BAFI.