Realistic and smooth full-body tracking is crucial for immersive AR/VR applications. Existing systems primarily track head and hands via Head Mounted Devices (HMDs) and controllers, making the 3D full-body reconstruction in-complete. One potential approach is to generate the full-body motions from sparse inputs collected from limited sensors using a Neural Network (NN) model. In this paper, we propose a novel method based on a multi-layer perceptron (MLP) backbone that is enhanced with residual connections and a novel NN-component called Memory-Block. In particular, Memory-Block represents missing sensor data with trainable code-vectors, which are combined with the sparse signals from previous time instances to improve the temporal consistency. Furthermore, we formulate our solution as a multi-task learning problem, allowing our MLP-backbone to learn robust representations that boost accuracy. Our experiments show that our method outperforms state-of-the-art baselines by substantially reducing prediction errors. Moreover, it achieves 72 FPS on mobile HMDs that ultimately improves the accuracy-running time tradeoff.
翻译:逼真且流畅的全身追踪对于沉浸式增强现实/虚拟现实应用至关重要。现有系统主要通过头戴式设备和控制器追踪头部与手部,导致三维全身重建不完整。一种潜在方法是利用神经网络模型,从有限传感器采集的稀疏输入中生成全身运动。本文提出一种基于多层感知机主干网络的新方法,该方法通过残差连接及一种称为记忆块的新型神经网络组件进行增强。具体而言,记忆块使用可训练编码向量表示缺失的传感器数据,这些向量与先前时间步的稀疏信号相结合,以提升时间一致性。此外,我们将解决方案构建为多任务学习问题,使多层感知机主干网络能够学习增强准确性的鲁棒表示。实验表明,本方法通过显著降低预测误差,优于现有先进基线模型。同时,其在移动头戴式设备上达到72帧/秒的帧率,最终优化了精度与运行时间的权衡。