In this letter, we present a novel markerless 3D human motion capture (MoCap) system for unstructured, outdoor environments that uses a team of autonomous unmanned aerial vehicles (UAVs) with on-board RGB cameras and computation. Existing methods are limited by calibrated cameras and off-line processing. Thus, we present the first method (AirPose) to estimate human pose and shape using images captured by multiple extrinsically uncalibrated flying cameras. AirPose itself calibrates the cameras relative to the person instead of relying on any pre-calibration. It uses distributed neural networks running on each UAV that communicate viewpoint-independent information with each other about the person (i.e., their 3D shape and articulated pose). The person's shape and pose are parameterized using the SMPL-X body model, resulting in a compact representation, that minimizes communication between the UAVs. The network is trained using synthetic images of realistic virtual environments, and fine-tuned on a small set of real images. We also introduce an optimization-based post-processing method (AirPose$^{+}$) for offline applications that require higher MoCap quality. We make our method's code and data available for research at https://github.com/robot-perception-group/AirPose. A video describing the approach and results is available at https://youtu.be/xLYe1TNHsfs.
翻译:在此信里,我们展示了一个新型的无标记的3D人类运动捕获系统(MoCap),用于无结构的户外环境,该系统使用一个由自动无人驾驶飞行器组成的团队,在机上使用RGB摄像机和计算,现有方法受校准相机和离线处理的限制。因此,我们展示了第一种方法(AirPose),使用多盘外未经校准的飞行摄像机拍摄的图像来估计人类的外形和形状。AirPose本身将相机与人相对比对,而不是依赖任何预先校准。它使用分布式的神经网络,在每架无人驾驶飞行器上运行,相互交流与他人独立的视觉信息(即,其3D形状和清晰的外形)。这个人的形状和姿势使用SMPL-X机型模型进行参数化。该网络经过培训,使用现实虚拟环境合成图像,对一小套真实图像进行微调。我们还采用了基于优化的后处理方法(AirPose$$),用于在离线应用中提供高品质数据的方法。