3D hand pose estimation from monocular videos is a long-standing and challenging problem, which is now seeing a strong upturn. In this work, we address it for the first time using a single event camera, i.e., an asynchronous vision sensor reacting on brightness changes. Our EventHands approach has characteristics previously not demonstrated with a single RGB or depth camera such as high temporal resolution at low data throughputs and real-time performance at 1000 Hz. Due to the different data modality of event cameras compared to classical cameras, existing methods cannot be directly applied to and re-trained for event streams. We thus design a new neural approach which accepts a new event stream representation suitable for learning, which is trained on newly-generated synthetic event streams and can generalise to real data. Experiments show that EventHands outperforms recent monocular methods using a colour (or depth) camera in terms of accuracy and its ability to capture hand motions of unprecedented speed. Our method, the event stream simulator and the dataset are publicly available; see https://4dqv.mpi-inf.mpg.de/EventHands/
翻译:单视视频的3D手面显示显示是一个长期且具有挑战性的问题, 这个问题目前呈现出强烈的上升趋势。 在这项工作中, 我们第一次使用单一的事件相机, 即一个对亮度变化反应的无同步的视觉传感器来解决这个问题。 我们的“ 事件图案” 方法具有以前没有显示的特性, 例如低数据通过量高时间分辨率和1000赫兹的实时性能。 由于事件相机与古典相机相比的数据模式不同, 现有方法无法直接应用于事件流并重新培训。 我们因此设计一种新的神经方法, 接受适合学习的新事件流代表, 以新生成的合成事件流为培训, 可以概括真实数据。 实验显示, “ 事件图案” 在精确度和以前所未有的速度捕捉手动的能力方面, 超越了最近使用彩色( 或深度) 相机的单色( ) 。 我们的方法、 事件流模拟器和数据集是公开提供的; 参见 https://dqv. mpi-inf.mpg.de/Event/Hvent; 实验显示: