Walking in place for moving through virtual environments has attracted noticeable attention recently. Recent attempts focused on training a classifier to recognize certain patterns of gestures (e.g., standing, walking, etc) with the use of neural networks like CNN or LSTM. Nevertheless, they often consider very few types of gestures and/or induce less desired latency in virtual environments. In this paper, we propose a novel framework for accurate and efficient classification of in-place gestures. Our key idea is to treat several consecutive frames as a "point cloud". The HMD and two VIVE trackers provide three points in each frame, with each point consisting of 12-dimensional features (i.e., three-dimensional position coordinates, velocity, rotation, angular velocity). We create a dataset consisting of 9 gesture classes for virtual in-place locomotion. In addition to the supervised point-based network, we also take unsupervised domain adaptation into account due to inter-person variations. To this end, we develop an end-to-end joint framework involving both a supervised loss for supervised point learning and an unsupervised loss for unsupervised domain adaptation. Experiments demonstrate that our approach generates very promising outcomes, in terms of high overall classification accuracy (95.0%) and real-time performance (192ms latency). Our code will be publicly available at: https://github.com/ZhaoLizz/PCT-MCD.
翻译:最近有人试图通过CNN 或 LSTM 等神经网络来训练一个分类员来识别某些姿态模式(如立体、行走等),例如CNN或LSTM 。然而,他们往往考虑的姿态类型很少,和/或诱导虚拟环境中不那么理想的悬浮。在本文件中,我们提出了一个关于准确和高效地分类内部姿态的新框架。我们的主要想法是将若干连续的192个框架作为“点云”对待。HMD和两个VIV跟踪器在每个框架中提供三个点,每个点包括12维特征(即三维位置坐标、速度、旋转、角速度)。我们创建了一个数据集,包括9个手势类,用于虚拟的在虚拟环境中移动。除了基于点的网络外,我们还将不受监控的域适应作为人际变化的考虑因素。为此,我们开发了一个端对端对端联合框架,包括监督点学习的亏损,还有由12维特征构成的每个点(即三维位置坐标坐标坐标坐标坐标坐标、速度、旋转、旋转旋转速度、直角速度精确度损失) 将展示整个域域域域图(9) 将展示结果。