The focus of this paper is dynamic gesture recognition in the context of the interaction between humans and machines. We propose a model consisting of two sub-networks, a transformer and an ordered-neuron long-short-term-memory (ON-LSTM) based recurrent neural network (RNN). Each sub-network is trained to perform the task of gesture recognition using only skeleton joints. Since each sub-network extracts different types of features due to the difference in architecture, the knowledge can be shared between the sub-networks. Through knowledge distillation, the features and predictions from each sub-network are fused together into a new fusion classifier. In addition, a cyclical learning rate can be used to generate a series of models that are combined in an ensemble, in order to yield a more generalizable prediction. The proposed ensemble of knowledge-sharing models exhibits an overall accuracy of 86.11% using only skeleton information, as tested using the Dynamic Hand Gesture-14/28 dataset
翻译:本文的焦点是在人与机器相互作用的背景下动态姿态识别。 我们提出一个模型, 由两个子网络组成, 一个变压器和一个基于命令中中长期短期神经网络(ON- LSTM) 的常规神经网络(RNN) 。 每个子网络都受过训练, 只能使用骨架连接来完成手势识别任务。 由于每个子网络由于结构的不同而提取了不同类型的特征, 知识可以在子网络之间共享。 通过知识蒸馏, 每个子网络的特征和预测被整合到一个新的聚变分类器中。 此外, 周期学习率可以用来生成一系列模型, 结合成一个组合, 以便产生更普遍的预测。 拟议的知识共享模型组合显示86.11%的总体精确度, 仅使用动态手动14/28数据集测试的骨架信息。