Event camera has offered promising alternative for visual perception, especially in high speed and high dynamic range scenes. Recently, many deep learning methods have shown great success in providing model-free solutions to many event-based problems, such as optical flow estimation. However, existing deep learning methods did not address the importance of temporal information well from the perspective of architecture design and cannot effectively extract spatio-temporal features. Another line of research that utilizes Spiking Neural Network suffers from training issues for deeper architecture. To address these points, a novel input representation is proposed that captures the events temporal distribution for signal enhancement. Moreover, we introduce a spatio-temporal recurrent encoding-decoding neural network architecture for event-based optical flow estimation, which utilizes Convolutional Gated Recurrent Units to extract feature maps from a series of event images. Besides, our architecture allows some traditional frame-based core modules, such as correlation layer and iterative residual refine scheme, to be incorporated. The network is end-to-end trained with self-supervised learning on the Multi-Vehicle Stereo Event Camera dataset. We have shown that it outperforms all the existing state-of-the-art methods by a large margin.
翻译:活动相机为视觉感知提供了大有希望的替代方法,特别是在高速和高动态场景中。最近,许多深层学习方法在为许多事件造成的问题提供无模式的无标准解决方案方面取得了巨大成功,例如光流估计。然而,现有的深层学习方法并没有从建筑设计的角度解决时间信息的重要性,也没有从一系列事件图像中提取特征。另一系列利用Spiking神经网络的研究有更深结构的培训问题。为了解决这些点,提出了一个新的投入代表,以捕捉事件的时间分布用于信号增强。此外,我们为基于事件的光流估计引入了一个随机经常性编码破码神经网络结构,利用动态常规单元从一系列事件图像中提取特征地图。此外,我们的建筑允许纳入一些传统的基于框架的核心模块模块,例如相关层和迭代残留精炼计划。这个网络是端到端培训,在多冰晶体运动相机数据集中进行自我监督的学习。我们展示了它超越了现有大比例。