Monocular 3D motion capture (mocap) is beneficial to many applications. The use of a single camera, however, often fails to handle occlusions of different body parts and hence it is limited to capture relatively simple movements. We present a light-weight, hybrid mocap technique called HybridCap that augments the camera with only 4 Inertial Measurement Units (IMUs) in a learning-and-optimization framework. We first employ a weakly-supervised and hierarchical motion inference module based on cooperative Gated Recurrent Unit (GRU) blocks that serve as limb, body and root trackers as well as an inverse kinematics solver. Our network effectively narrows the search space of plausible motions via coarse-to-fine pose estimation and manages to tackle challenging movements with high efficiency. We further develop a hybrid optimization scheme that combines inertial feedback and visual cues to improve tracking accuracy. Extensive experiments on various datasets demonstrate HybridCap can robustly handle challenging movements ranging from fitness actions to Latin dance. It also achieves real-time performance up to 60 fps with state-of-the-art accuracy.
翻译:单体 3D 运动抓捕( mocap) 有益于许多应用。 但是, 单个相机的使用往往无法处理不同身体部位的隔绝, 因而只能捕捉相对简单的运动。 我们展示了一种叫作混合Cap的轻量混合技术, 它将光量、 混合Mexcap 在学习和优化框架内仅用4个惯性测量单位来放大相机。 我们首先使用一个以合作性Gated 常规单元( GRU) 为基础的低监管和等级运动推断模块, 用作肢体、 身体和根追踪器以及反动能解答器。 我们的网络有效地缩小了通过粗微到纤维进行估计并设法以高效的方式应对挑战性运动的搜索空间。 我们进一步开发一个混合优化计划, 将惯性反馈和视觉提示结合起来, 以更好地跟踪准确性。 对各种数据集进行广泛的实验, 显示混合Cap 能够强有力地处理从健身动作到拉丁舞蹈等具有挑战性的运动。 它还实现了实时性表现到60英尺的精确度, 。</s>