In this paper, we propose a Few-shot Learning pipeline for 3D skeleton-based action recognition by Joint tEmporal and cAmera viewpoiNt alIgnmEnt (JEANIE). To factor out misalignment between query and support sequences of 3D body joints, we propose an advanced variant of Dynamic Time Warping which jointly models each smooth path between the query and support frames to achieve simultaneously the best alignment in the temporal and simulated camera viewpoint spaces for end-to-end learning under the limited few-shot training data. Sequences are encoded with a temporal block encoder based on Simple Spectral Graph Convolution, a lightweight linear Graph Neural Network backbone (we also include a setting with a transformer). Finally, we propose a similarity-based loss which encourages the alignment of sequences of the same class while preventing the alignment of unrelated sequences. We demonstrate state-of-the-art results on NTU-60, NTU-120, Kinetics-skeleton and UWA3D Multiview Activity II.
翻译:在本文中,我们提出一个微小的学习管道,用于3D骨骼行动识别,由联合时间和CAmera ViewpoiNt algnmEnt(JENIE)进行。为了将3D体连接的查询和支持序列之间的不匹配考虑在内,我们提议了一个动态时间转换的先进变式,该变式将每个查询和支持框架之间的光滑路径共同建模,以同时实现时间和模拟相机视距空间在有限的少发训练数据下的最佳对齐,序列以一个基于简单光谱图谱图集、轻量线形神经网络主干线(我们也包括一个变压器设置)的时间块编码。最后,我们提出一个类似的基于损失的模型,鼓励同一类序列的顺序对齐,同时防止不相关的序列的对齐。我们在NTU-60、NTU-120、Kinetics-skeleton和UWA3D多视图活动二号上展示了最新的艺术结果。