Human action recognition aims at classifying the category of human action from a segment of a video. Recently, people have dived into designing GCN-based models to extract features from skeletons for performing this task, because skeleton representations are much more efficient and robust than other modalities such as RGB frames. However, when employing the skeleton data, some important clues like related items are also discarded. It results in some ambiguous actions that are hard to be distinguished and tend to be misclassified. To alleviate this problem, we propose an auxiliary feature refinement head (FR Head), which consists of spatial-temporal decoupling and contrastive feature refinement, to obtain discriminative representations of skeletons. Ambiguous samples are dynamically discovered and calibrated in the feature space. Furthermore, FR Head could be imposed on different stages of GCNs to build a multi-level refinement for stronger supervision. Extensive experiments are conducted on NTU RGB+D, NTU RGB+D 120, and NW-UCLA datasets. Our proposed models obtain competitive results from state-of-the-art methods and can help to discriminate those ambiguous samples. Codes are available at https://github.com/zhysora/FR-Head.
翻译:最近,人们在设计以GCN为基础的模型时,为完成这项任务而从骨架上提取特征,因为骨骼表象比RGB框架等其他模式更高效、更健全;然而,在使用骨架数据时,一些重要线索,如相关物品,也被抛弃;导致一些难以区分和往往被错误分类的模糊行动;为了缓解这一问题,我们提议了一个辅助特征改进头(FR Head),由空间时空脱钩和对比性特征改进组成,以获得骨架的区别性表现;在特征空间内,动态地发现和校准了模糊的样品;此外,FRCE头可以按GCN的不同阶段进行多层次的改进,以加强监督;对NTU RGB+D、NTU RGB+D 120和NW-ULAC数据集进行了广泛的实验;我们提议的模型从州-艺术方法中获得竞争性结果,有助于区别这些模糊的样品。</s>