Human action recognition aims at classifying the category of human action from a segment of a video. Recently, people dive into designing GCN-based models to extract features from skeletons for performing this task, because skeleton representations are much efficient and robust than other modalities such as RGB frames. However, when employing the skeleton data, some important clues like related items are also dismissed. It results in some ambiguous actions that are hard to be distinguished and tend to be misclassified. To alleviate this problem, we propose an auxiliary feature refinement head (FR Head), which consists of spatial-temporal decoupling and contrastive feature refinement, to obtain discriminative representations of skeletons. Ambiguous samples are dynamically discovered and calibrated in the feature space. Furthermore, FR Head could be imposed on different stages of GCNs to build a multi-level refinement for stronger supervision. Extensive experiments are conducted on NTU RGB+D, NTU RGB+D 120, and NW-UCLA datasets. Our proposed models obtain competitive results from state-of-the-art methods and can help to discriminate those ambiguous samples.
翻译:人类行动承认旨在将人类行动类别从一段视频中分类。最近,人们在设计基于GCN的模型时潜入到设计基于GCN的模型中,为完成这项任务而从骨骼中提取特征,因为骨骼表象比诸如RGB框架等其他模式非常高效和有力。然而,在使用骨骼数据时,某些重要线索,如相关物品,也被排除;导致一些难以区分和往往被错误分类的模糊行动;为缓解这一问题,我们提议了由时空脱钩和对比性特征改进组成的辅助特征改进头(FRHead),以获得骨骼的区别性表现;在地貌空间中动态地发现和校准模糊的样本;此外,FRHE头可以被强制在GCN的不同阶段建立多层次的改进,以加强监督;对NTU RGB+D、NTU RGB+D 120和NW-ULA数据集进行了广泛的实验;我们提议的模型从最新方法中获得竞争性的结果,有助于区别这些模糊的样本。</s>