Real-time human motion reconstruction from a sparse set of (e.g. six) wearable IMUs provides a non-intrusive and economic approach to motion capture. Without the ability to acquire position information directly from IMUs, recent works took data-driven approaches that utilize large human motion datasets to tackle this under-determined problem. Still, challenges remain such as temporal consistency, drifting of global and joint motions, and diverse coverage of motion types on various terrains. We propose a novel method to simultaneously estimate full-body motion and generate plausible visited terrain from only six IMU sensors in real-time. Our method incorporates 1. a conditional Transformer decoder model giving consistent predictions by explicitly reasoning prediction history, 2. a simple yet general learning target named "stationary body points" (SBPs) which can be stably predicted by the Transformer model and utilized by analytical routines to correct joint and global drifting, and 3. an algorithm to generate regularized terrain height maps from noisy SBP predictions which can in turn correct noisy global motion estimation. We evaluate our framework extensively on synthesized and real IMU data, and with real-time live demos, and show superior performance over strong baseline methods.
翻译:人类实时运动从一组稀少的(例如六)可磨损的多功能模型中重建人类实时运动,提供了一种非侵入性和经济性的运动抓捕方法。如果不能直接从多功能模型直接获得定位信息,最近的工程采用了数据驱动方法,利用大型人类运动数据集解决这一未确定的问题。然而,挑战依然存在,例如时间一致性、全球和联合动议的漂移和在不同地形上对运动类型的不同覆盖。我们提出了一个新颖的方法,以同时估计全体运动,从实时的6个多功能模型中产生貌似可访问的地形。我们的方法包括了1个有条件的变异器解码器模型,通过明确推理预测历史提供一致的预测,2个简单而普遍的学习目标,名为“静态身体点 ” (SBPs),可以由变异动器模型预测,并通过分析程序用于纠正联合和全球漂移,3个算法,从噪音的SBPP预测中得出固定的地形高度图,而这种预测反过来又可以纠正全球运动的噪音估计。我们的方法包含我们框架的合成和实时实时模拟数据,并用实时演示显示强的绩效。