Standard frame-based cameras that sample light intensity frames are heavily impacted by motion blur for high-speed motion and fail to perceive scene accurately when the dynamic range is high. Event-based cameras, on the other hand, overcome these limitations by asynchronously detecting the variation in individual pixel intensities. However, event cameras only provide information about pixels in motion, leading to sparse data. Hence, estimating the overall dense behavior of pixels is difficult. To address such issues associated with the sensors, we present Fusion-FlowNet, a sensor fusion framework for energy-efficient optical flow estimation using both frame- and event-based sensors, leveraging their complementary characteristics. Our proposed network architecture is also a fusion of Spiking Neural Networks (SNNs) and Analog Neural Networks (ANNs) where each network is designed to simultaneously process asynchronous event streams and regular frame-based images, respectively. Our network is end-to-end trained using unsupervised learning to avoid expensive video annotations. The method generalizes well across distinct environments (rapid motion and challenging lighting conditions) and demonstrates state-of-the-art optical flow prediction on the Multi-Vehicle Stereo Event Camera (MVSEC) dataset. Furthermore, our network offers substantial savings in terms of the number of network parameters and computational energy cost.
翻译:以标准框架为基础的相机,其样本光强度框架受到高速运动的运动模糊影响,且在动态范围高时无法准确观察现场。而以事件为基础的相机则通过对单个像素强度的变异进行不间断的检测,克服了这些限制。然而,事件摄像机只提供运动中像素的信息,导致数据稀少。因此,估计像素的总体密度行为是困难的。要解决与传感器有关的问题,我们介绍Fusion-FlowNet,一个使用框架传感器和事件传感器对节能光流进行估计的传感器聚合框架,利用这些传感器的互补特征。我们提议的网络结构也是Spiking神经网络(SNNIS)和Analog神经网络(ANNS)的聚合,每个网络的设计是同时处理不同步事件流和定期基于框架的图像。我们的网络是端对端到端培训,使用不超超常的学习来避免昂贵的视频说明。方法将不同环境(节制运动和具有挑战性的照明条件的传感器)的全局性光学框架。我们提议的网络架构结构结构还包括Stual-HIS-Smal-deal网络的大幅预测。