Autonomous driving is an exciting new industry, posing important research questions. Within the perception module, 3D human pose estimation is an emerging technology, which can enable the autonomous vehicle to perceive and understand the subtle and complex behaviors of pedestrians. While hardware systems and sensors have dramatically improved over the decades -- with cars potentially boasting complex LiDAR and vision systems and with a growing expansion of the available body of dedicated datasets for this newly available information -- not much work has been done to harness these novel signals for the core problem of 3D human pose estimation. Our method, which we coin HUM3DIL (HUMan 3D from Images and LiDAR), efficiently makes use of these complementary signals, in a semi-supervised fashion and outperforms existing methods with a large margin. It is a fast and compact model for onboard deployment. Specifically, we embed LiDAR points into pixel-aligned multi-modal features, which we pass through a sequence of Transformer refinement stages. Quantitative experiments on the Waymo Open Dataset support these claims, where we achieve state-of-the-art results on the task of 3D pose estimation.
翻译:自主驾驶是一个令人兴奋的新产业,它提出了重要的研究问题。在感知模块中,3D人构成估计是一种新兴技术,它可以使自主载体能够感知和理解行人微妙和复杂的行为。虽然硬件系统和传感器在过去几十年里已经大为改善 -- -- 汽车可能夸大复杂的激光雷达和视觉系统,而且汽车可能为这种新获得的信息而拥有的专用数据集越来越多 -- -- 在利用这些新的3D人构成估计核心问题的新信号方面没有做多少工作。我们用HUM3DIL(来自图像和激光雷达的HUMan 3D)这个方法,有效地使用这些辅助信号,以半受监督的方式,并且大大超越了现有方法。这是一个在机上部署的快速和紧凑模式。具体地说,我们将激光雷达点嵌入像素调整的多模式特征中,我们通过一个变异器改进阶段的顺序通过。Waymo Open数据集的量化实验支持了这些主张,我们在3D估计任务中取得了最先进的结果。