Event cameras are novel vision sensors that sample, in an asynchronous fashion, brightness increments with low latency and high temporal resolution. The resulting streams of events are of high value by themselves, especially for high speed motion estimation. However, a growing body of work has also focused on the reconstruction of intensity frames from the events, as this allows bridging the gap with the existing literature on appearance- and frame-based computer vision. Recent work has mostly approached this problem using neural networks trained with synthetic, ground-truth data. In this work we approach, for the first time, the intensity reconstruction problem from a self-supervised learning perspective. Our method, which leverages the knowledge of the inner workings of event cameras, combines estimated optical flow and the event-based photometric constancy to train neural networks without the need for any ground-truth or synthetic data. Results across multiple datasets show that the performance of the proposed self-supervised approach is in line with the state-of-the-art. Additionally, we propose a novel, lightweight neural network for optical flow estimation that achieves high speed inference with only a minor drop in performance.
翻译:事件摄像机是新型的视觉传感器,它以不同步的方式取样,亮度增加,低潜值和高时间分辨率。由此产生的事件流本身具有很高的价值,特别是高速运动估计。然而,越来越多的工作还侧重于从事件中重建强度框架,因为这样可以弥补与现有关于外观和基于框架的计算机视觉文献之间的差距。最近的工作大多使用经过合成、地面真实数据培训的神经网络处理这一问题。在这项工作中,我们第一次从自我监督的学习角度着手处理强度重建问题。我们的方法是利用事件摄像机内部工作的知识,将估计的光学流和基于事件的光度光度对应力结合起来,以培训神经网络,而不需要任何地面图象或合成数据。多个数据集的结果显示,拟议的自我监督方法的性能与现状相一致。此外,我们建议用一个新型、轻度的神经网络来进行光学流动估计,只有低度的性能才能达到高速度。