Event cameras are novel sensors that perceive the per-pixel intensity changes and output asynchronous event streams with high dynamic range and less motion blur. It has been shown that events alone can be used for end-task learning, e.g., semantic segmentation, based on encoder-decoder-like networks. However, as events are sparse and mostly reflect edge information, it is difficult to recover original details merely relying on the decoder. Moreover, most methods resort to pixel-wise loss alone for supervision, which might be insufficient to fully exploit the visual details from sparse events, thus leading to less optimal performance. In this paper, we propose a simple yet flexible two-stream framework named Dual Transfer Learning (DTL) to effectively enhance the performance on the end-tasks without adding extra inference cost. The proposed approach consists of three parts: event to end-task learning (EEL) branch, event to image translation (EIT) branch, and transfer learning (TL) module that simultaneously explores the feature-level affinity information and pixel-level knowledge from the EIT branch to improve the EEL branch. This simple yet novel method leads to strong representation learning from events and is evidenced by the significant performance boost on the end-tasks such as semantic segmentation and depth estimation.
翻译:活动相机是新颖的传感器,它能感知每像素强度变化和输出的不同步事件流,其动态范围大,运动不那么模糊。已经显示,仅事件本身可以用于终端任务学习,例如,基于类似编码器的网络的语义分解;然而,由于事件稀少,而且大多反映边缘信息,因此很难仅依靠解码器来恢复原始细节。此外,大多数方法仅靠像素感知损失进行监督,可能不足以充分利用稀有事件的视觉细节,从而导致不那么最佳的性能。在本文件中,我们提议了一个简单而灵活的双流框架,名为双重转移学习(DTTL),以有效提高终端任务上的性能,而不增加额外的推断成本。拟议的方法包括三个部分:最终任务学习(EL)分支、图像翻译(EIT)分支和转移学习(TL)模块,该模块可能不足以同时探索地级近似信息以及从低频事件获得的像素级知识,从而导致业绩的最佳性能水平知识。我们提议一个简单但灵活的双流框架,即通过模拟的深度分析方法来改进EL段的演示。