In-place gesture-based virtual locomotion techniques enable users to control their viewpoint and intuitively move in the 3D virtual environment. A key research problem is to accurately and quickly recognize in-place gestures, since they can trigger specific movements of virtual viewpoints and enhance user experience. However, to achieve real-time experience, only short-term sensor sequence data (up to about 300ms, 6 to 10 frames) can be taken as input, which actually affects the classification performance due to limited spatio-temporal information. In this paper, we propose a novel long-term memory augmented network for in-place gestures classification. It takes as input both short-term gesture sequence samples and their corresponding long-term sequence samples that provide extra relevant spatio-temporal information in the training phase. We store long-term sequence features with an external memory queue. In addition, we design a memory augmented loss to help cluster features of the same class and push apart features from different classes, thus enabling our memory queue to memorize more relevant long-term sequence features. In the inference phase, we input only short-term sequence samples to recall the stored features accordingly, and fuse them together to predict the gesture class. We create a large-scale in-place gestures dataset from 25 participants with 11 gestures. Our method achieves a promising accuracy of 95.1% with a latency of 192ms, and an accuracy of 97.3% with a latency of 312ms, and is demonstrated to be superior to recent in-place gesture classification techniques. User study also validates our approach. Our source code and dataset will be made available to the community.
翻译:定位以手势为基础的虚拟移动技术使用户能够控制他们的观点并在 3D 虚拟环境中直观移动。 一个关键的研究问题是准确和快速识别在位的手势,因为它们可以引发虚拟观点的特定移动,并增强用户的经验。 然而,为了实现实时经验,只有短期传感器序列数据(最多300米,6至10框架)可以被视为输入,这实际上影响到分类性能,因为时空信息有限。在本文中,我们建议为在位的手势分类建立一个新的长期内存增强网络。一个关键的研究问题,即准确和迅速识别在位的手势,因为可以触发虚拟观点的手势和相应的长期序列样本,在培训阶段,它们可以引发与外存存储队列的长序列特征。此外,我们设计一个记忆增强的损失来帮助同一类的集群特性,并且将不同等级的特性推开,从而使得我们的记忆队列能够实现更相关的长期序列特征。 在引力阶段,我们只输入短期的手势序列样本和相应的长期序列样本样本,我们只需要将一个最短的顺序样本 以11级的动作样本 。