Event camera is an emerging bio-inspired vision sensors that report per-pixel brightness changes asynchronously. It holds noticeable advantage of high dynamic range, high speed response, and low power budget that enable it to best capture local motions in uncontrolled environments. This motivates us to unlock the potential of event cameras for human pose estimation, as the human pose estimation with event cameras is rarely explored. Due to the novel paradigm shift from conventional frame-based cameras, however, event signals in a time interval contain very limited information, as event cameras can only capture the moving body parts and ignores those static body parts, resulting in some parts to be incomplete or even disappeared in the time interval. This paper proposes a novel densely connected recurrent architecture to address the problem of incomplete information. By this recurrent architecture, we can explicitly model not only the sequential but also non-sequential geometric consistency across time steps to accumulate information from previous frames to recover the entire human bodies, achieving a stable and accurate human pose estimation from event data. Moreover, to better evaluate our model, we collect a large scale multimodal event-based dataset that comes with human pose annotations, which is by far the most challenging one to the best of our knowledge. The experimental results on two public datasets and our own dataset demonstrate the effectiveness and strength of our approach. Code can be available online for facilitating the future research.
翻译:活动相机是一个新兴的生物激励型视觉传感器,它报告每像素亮度变化无休止。它具有高动态范围、高速响应和低功率预算的显著优势,使其能够在不受控制的环境中最好地捕捉到当地动作。这促使我们释放事件相机的潜力,以便进行人造图象估计,因为很少探索用事件相机对事件进行人造估计。然而,由于从传统的框架摄像机对事件进行新的范式转变,在一个时间间隔内,事件信号包含的信息非常有限,因为事件相机只能捕捉移动的身体部分,忽视这些静态身体部分,导致某些部分不完整,甚至时间间隔期间消失。本文提出了一个新的、紧密相连的经常性结构,以解决不完整信息的问题。根据这个经常性结构,我们可以明确建模事件摄影机,不仅按顺序,而且不按顺序进行对时间顺序进行估计,从以前的框架到恢复整个人类身体,从事件数据中获得稳定而准确的人类构成估计。此外,为了更好地评估我们的模型,我们收集了大规模、基于事件的数据集成的多式数据集,并附有人类姿势说明,这是我们最具有挑战性的未来数据。