Moving Object Detection (MOD) is a critical vision task for successfully achieving safe autonomous driving. Despite plausible results of deep learning methods, most existing approaches are only frame-based and may fail to reach reasonable performance when dealing with dynamic traffic participants. Recent advances in sensor technologies, especially the Event camera, can naturally complement the conventional camera approach to better model moving objects. However, event-based works often adopt a pre-defined time window for event representation, and simply integrate it to estimate image intensities from events, neglecting much of the rich temporal information from the available asynchronous events. Therefore, from a new perspective, we propose RENet, a novel RGB-Event fusion Network, that jointly exploits the two complementary modalities to achieve more robust MOD under challenging scenarios for autonomous driving. Specifically, we first design a temporal multi-scale aggregation module to fully leverage event frames from both the RGB exposure time and larger intervals. Then we introduce a bi-directional fusion module to attentively calibrate and fuse multi-modal features. To evaluate the performance of our network, we carefully select and annotate a sub-MOD dataset from the commonly used DSEC dataset. Extensive experiments demonstrate that our proposed method performs significantly better than the state-of-the-art RGB-Event fusion alternatives.
翻译:移动物体探测(MOD)是成功实现安全自主驾驶的关键愿景任务。尽管深层次学习方法产生了可信的结果,但大多数现有方法仅以框架为基础,在与动态交通参与者打交道时可能无法达到合理的性能。传感器技术的最新进展,特别是“事件”相机,自然可以补充常规相机方法,以更好地模拟移动对象。然而,基于事件的工作往往采用预设的时间窗口,用于事件表达,而只是将其整合,以估计事件产生的图像强度,忽略了现有同步事件中大量丰富的时间信息。因此,从新的角度出发,我们提议了RENet,即一个新的RGB-Event聚合网络,它联合利用两种互补模式,在具有挑战性的自主驱动情景下,实现更强有力的MOD。具体地说,我们首先设计一个时间性多尺度的集成模块,充分利用 RGBG 暴露时间和更大间隔的事件框架。然后我们引入一个双向组合模块,用于仔细校准和结合现有非同步事件的多模式特征。为了评估我们的网络的性能,我们仔细选择并记录一个子MOD-D-D-D-ROD-degration a must lavely lavement the the propment the propment the dust the propmented the propment apress aproptional