Multi-person motion capture can be challenging due to ambiguities caused by severe occlusion, fast body movement, and complex interactions. Existing frameworks build on 2D pose estimations and triangulate to 3D coordinates via reasoning the appearance, trajectory, and geometric consistencies among multi-camera observations. However, 2D joint detection is usually incomplete and with wrong identity assignments due to limited observation angle, which leads to noisy 3D triangulation results. To overcome this issue, we propose to explore the short-range autoregressive characteristics of skeletal motion using transformer. First, we propose an adaptive, identity-aware triangulation module to reconstruct 3D joints and identify the missing joints for each identity. To generate complete 3D skeletal motion, we then propose a Dual-Masked Auto-Encoder (D-MAE) which encodes the joint status with both skeletal-structural and temporal position encoding for trajectory completion. D-MAE's flexible masking and encoding mechanism enable arbitrary skeleton definitions to be conveniently deployed under the same framework. In order to demonstrate the proposed model's capability in dealing with severe data loss scenarios, we contribute a high-accuracy and challenging motion capture dataset of multi-person interactions with severe occlusion. Evaluations on both benchmark and our new dataset demonstrate the efficiency of our proposed model, as well as its advantage against the other state-of-the-art methods.
翻译:由于严重隔离、身体快速移动和复杂互动导致的模糊性,多人运动的抓捕可能具有挑战性。现有框架建立在2D基础上,通过推理多摄观测的外观、轨迹和几何组合,对3D坐标进行估计和三角三角。然而,由于观测角度有限,2D联合探测通常不完整,身份分配错误,导致3D三角测量结果;为了克服这一问题,我们提议探索使用变压器的骨骼运动的短距离自动递增特性。首先,我们提议一个适应性、身份认知三角三角组合模块,以重建3D联合,并找出每个身份缺失的连接点。为了产生完整的3DKeletal运动,我们然后提议一个双向双向自动编码自动编码自动编码器(D-MAE),将联合状态与骨骼结构结构和时间位置调对完成轨迹模型。D-MAE的灵活遮掩罩和编码机制使得任意的骨架定义能够方便地在同一框架内进行部署。为了展示拟议模型在处理高数据损失假设情景方面具有挑战性的能力,我们用高数据模型和高标准模型来展示。