Continuous mid-air hand gesture recognition based on captured hand pose streams is fundamental for human-computer interaction, particularly in AR / VR. However, many of the methods proposed to recognize heterogeneous hand gestures are tested only on the classification task, and the real-time low-latency gesture segmentation in a continuous stream is not well addressed in the literature. For this task, we propose the On-Off deep Multi-View Multi-Task paradigm (OO-dMVMT). The idea is to exploit multiple time-local views related to hand pose and movement to generate rich gesture descriptions, along with using heterogeneous tasks to achieve high accuracy. OO-dMVMT extends the classical MVMT paradigm, where all of the multiple tasks have to be active at each time, by allowing specific tasks to switch on/off depending on whether they can apply to the input. We show that OO-dMVMT defines the new SotA on continuous/online 3D skeleton-based gesture recognition in terms of gesture classification accuracy, segmentation accuracy, false positives, and decision latency while maintaining real-time operation.
翻译:手势识别作为基于手势姿势流的人机交互基础,对于增强现实/虚拟现实等应用至关重要。然而,现有方法往往仅在分类任务上进行测试,并未很好地解决持续流中的实时低延迟手势分割。针对这一问题,我们提出了基于On-Off深度多视角多任务(OO-dMVMT)的解决方案。该方法的主要思路是利用与手部姿势和运动相关的多个时空视角生成丰富的手势描述,同时利用多个任务实现高准确率。基于OO-dMVMT框架,我们对传统的多任务多视角(MVMT)范式进行了扩展,使得特定任务可以根据输入是否适用而开关。研究结果表明,OO-dMVMT具有分类准确性、分割准确性、误判和决策延迟的领先水平,并且保持实时操作。