Marker-based optical motion capture (mocap) is the "gold standard" method for acquiring accurate 3D human motion in computer vision, medicine, and graphics. The raw output of these systems are noisy and incomplete 3D points or short tracklets of points. To be useful, one must associate these points with corresponding markers on the captured subject; i.e. "labelling". Given these labels, one can then "solve" for the 3D skeleton or body surface mesh. Commercial auto-labeling tools require a specific calibration procedure at capture time, which is not possible for archival data. Here we train a novel neural network called SOMA, which takes raw mocap point clouds with varying numbers of points, labels them at scale without any calibration data, independent of the capture technology, and requiring only minimal human intervention. Our key insight is that, while labeling point clouds is highly ambiguous, the 3D body provides strong constraints on the solution that can be exploited by a learning-based method. To enable learning, we generate massive training sets of simulated noisy and ground truth mocap markers animated by 3D bodies from AMASS. SOMA exploits an architecture with stacked self-attention elements to learn the spatial structure of the 3D body and an optimal transport layer to constrain the assignment (labeling) problem while rejecting outliers. We extensively evaluate SOMA both quantitatively and qualitatively. SOMA is more accurate and robust than existing state of the art research methods and can be applied where commercial systems cannot. We automatically label over 8 hours of archival mocap data across 4 different datasets captured using various technologies and output SMPL-X body models. The model and data is released for research purposes at https://soma.is.tue.mpg.de/.
翻译:基于标记的光动捕捉(mocap)是获取计算机视觉、医学和图形中准确的 3D 人类运动的“黄金标准” 方法。 这些系统的原始输出是杂音和不完整的 3D 点或短轨数。 要有用, 就必须将这些点与所捕捉的主题对应的标记联系起来; 即“ 标签 ” 。 根据这些标签, 3D 骨架或身体表面网块可以“ 解析” 。 商业自动标签工具在捕捉时需要一种特定的校正程序, 这对于档案数据来说是不可能的。 我们在这里训练了一个叫SOMA的新神经网络, 它收集了数量数量不尽多的原始模调点云云云云, 且有不同数量的点云云云云云云云, 没有校准数据技术, 也不需要最小的SOMA 数据结构, 将SOMA 和 4SOMA 结构的模型进行自我定位, 将SOMA 和 4SMA 结构的模型进行自我定位, 将SOAAS 的系统进行自我分析。