Reliable and robust user identification and authentication are important and often necessary requirements for many digital services. It becomes paramount in social virtual reality (VR) to ensure trust, specifically in digital encounters with lifelike realistic-looking avatars as faithful replications of real persons. Recent research has shown great interest in providing new solutions that verify the identity of extended reality (XR) systems. This paper compares different machine learning approaches to identify users based on arbitrary sequences of head and hand movements, a data stream provided by the majority of today's XR systems. We compare three different potential representations of the motion data from heads and hands (scene-relative, body-relative, and body-relative velocities), and by comparing the performances of five different machine learning architectures (random forest, multilayer perceptron, fully recurrent neural network, long-short term memory, gated recurrent unit). We use the publicly available dataset "Talking with Hands" and publish all our code to allow reproducibility and to provide baselines for future work. After hyperparameter optimization, the combination of a long-short term memory architecture and body-relative data outperformed competing combinations: the model correctly identifies any of the 34 subjects with an accuracy of 100\% within 150 seconds. The code for models, training and evaluation is made publicly available. Altogether, our approach provides an effective foundation for behaviometric-based identification and authentication to guide researchers and practitioners.
翻译:对许多数字服务来说,可靠和可靠的用户识别和认证是重要而且往往是必要的要求。在社会虚拟现实中,最重要的是确保信任,特别是在以数字方式与真实人物的忠实复制品的形式,以数字方式接触像生命一样的、现实的变异体时。最近的研究表明,非常有兴趣提供新的解决方案,以核实扩展的现实(XR)系统的身份。本文比较了不同的机器学习方法,以根据头部和手部运动的任意顺序,即当今大多数XR系统提供的数据流,来识别用户。我们比较了从头部和手部获得的运动数据的三个不同的潜在表达方式(视觉、身体变异和身体变异速度),并且比较了五个不同的机器学习结构(兰多森林、多层透视、完全重复的神经网络、长期短期记忆、封闭的经常性单元)。我们使用公开的数据集“手动调控”并公布了我们所有基于我们的代码,以便能够再复制,并为未来工作提供基准。在超常数模型优化后,将150个基础模型和直观模型的组合结合起来,这是我们现有的实体记忆和直观的模型的组合。