Event cameras are novel bio-inspired vision sensors that output pixel-level intensity changes in microsecond accuracy with a high dynamic range and low power consumption. Despite these advantages, event cameras cannot be directly applied to computational imaging tasks due to the inability to obtain high-quality intensity and events simultaneously. This paper aims to connect a standalone event camera and a modern intensity camera so that the applications can take advantage of both two sensors. We establish this connection through a multi-modal stereo matching task. We first convert events to a reconstructed image and extend the existing stereo networks to this multi-modality condition. We propose a self-supervised method to train the multi-modal stereo network without using ground truth disparity data. The structure loss calculated on image gradients is used to enable self-supervised learning on such multi-modal data. Exploiting the internal stereo constraint between views with different modalities, we introduce general stereo loss functions, including disparity cross-consistency loss and internal disparity loss, leading to improved performance and robustness compared to existing approaches. The experiments demonstrate the effectiveness of the proposed method, especially the proposed general stereo loss functions, on both synthetic and real datasets. At last, we shed light on employing the aligned events and intensity images in downstream tasks, e.g., video interpolation application.
翻译:活动相机是新型的生物激励视觉传感器,它输出像素水平的精度变化微秒精度,具有高动态范围和低功率消耗量。尽管有这些优势,但事件相机无法直接用于计算成像任务,因为无法同时获得高质量的强度和事件。本文的目的是将独立的事件相机和现代的强度相机连接起来,以便应用程序能够利用两个传感器。我们通过多式立体匹配任务建立这种连接。我们首先将事件转换成一个重建的图像,并将现有立体网络扩大到这一多式状态。我们提议了一个自我监督的方法,用于培训多式立体网络,而不使用地面的真象差异数据。在图像梯度上计算的结构损失被用来使这种多式数据的自我监督学习得以进行。用不同模式来挖掘各种观点之间的内部立体制约,我们引入了一般的立体损失功能,包括差异性相异性损失和内部差异损失,与现有方法相比,导致性能和稳健性得到改善。我们提出了一种自我监督的方法,用以培训多式立体立体声网络,而不用地面差异数据差异数据数据数据数据数据数据数据数据数据数据。我们最后、实际和图像同步同步应用。