Event camera is an emerging imaging sensor for capturing dynamics of moving objects as events, which motivates our work in estimating 3D human pose and shape from the event signals. Events, on the other hand, have their unique challenges: rather than capturing static body postures, the event signals are best at capturing local motions. This leads us to propose a two-stage deep learning approach, called EventHPE. The first-stage, FlowNet, is trained by unsupervised learning to infer optical flow from events. Both events and optical flow are closely related to human body dynamics, which are fed as input to the ShapeNet in the second stage, to estimate 3D human shapes. To mitigate the discrepancy between image-based flow (optical flow) and shape-based flow (vertices movement of human body shape), a novel flow coherence loss is introduced by exploiting the fact that both flows are originated from the identical human motion. An in-house event-based 3D human dataset is curated that comes with 3D pose and shape annotations, which is by far the largest one to our knowledge. Empirical evaluations on DHP19 dataset and our in-house dataset demonstrate the effectiveness of our approach.
翻译:事件相机是一个新兴的图像传感器,用来捕捉作为事件而移动的物体的动态,这促使我们从事件信号中估算3D人类的外形和形状。 另一方面,事件具有独特的挑战:事件信号不是捕捉静止的体态,而是最能捕捉当地动作。这导致我们提出一个两阶段深层次的学习方法,称为DientHPE。第一阶段,即FlowNet,经过未经监督的学习培训,以推断事件产生的光学流。两个事件和光学流都与人体动态密切相关,而人体动态是作为输入ShapeNet第二阶段的输入,用于估计3D人类形状。为了缩小基于图像的流(光流)和基于形状的流(人体形状的垂直运动)之间的差异,通过利用这两种流动都源自相同的人类运动这一事实,引入了新的流动一致性损失。一个以内部事件为基础的3D人为数据集,与3D的外形形和形形形图解有关,这是我们知识中最大的数据。DHP19数据库和我们内部数据集的有效性。