Moving Object Detection (MOD) is a critical vision task for successfully achieving safe autonomous driving. Despite plausible results of deep learning methods, most existing approaches are only frame-based and may fail to reach reasonable performance when dealing with dynamic traffic participants. Recent advances in sensor technologies, especially the Event camera, can naturally complement the conventional camera approach to better model moving objects. However, event-based works often adopt a pre-defined time window for event representation, and simply integrate it to estimate image intensities from events, neglecting much of the rich temporal information from the available asynchronous events. Therefore, from a new perspective, we propose RENet, a novel RGB-Event fusion Network, that jointly exploits the two complementary modalities to achieve more robust MOD under challenging scenarios for autonomous driving. Specifically, we first design a temporal multi-scale aggregation module to fully leverage event frames from both the RGB exposure time and larger intervals. Then we introduce a bi-directional fusion module to attentively calibrate and fuse multi-modal features. To evaluate the performance of our network, we carefully select and annotate a sub-MOD dataset from the commonly used DSEC dataset. Extensive experiments demonstrate that our proposed method performs significantly better than the state-of-the-art RGB-Event fusion alternatives. The source code and dataset are publicly available at: https://github.com/ZZY-Zhou/RENet.
翻译:移动天体探测(MOD)是成功实现安全自主驾驶的关键愿景任务。 尽管深层次学习方法产生了可信的结果,但大多数现有方法都只是基于框架,在与动态交通参与者打交道时可能无法达到合理的性能。传感器技术的最新进展,特别是“事件”相机,自然可以补充常规相机方法,以更好地模拟移动天体。然而,基于事件的工作往往采用预设的时间窗口,用于事件演示,而只是将其整合,以估计事件产生的图像强度,忽略了现有非同步事件中大量丰富的时间信息。因此,从新角度出发,我们建议使用RENet,即新的RGB-Event聚合网络网络网络,以联合利用两种互补模式,在具有挑战性的自主驱动情景下,实现更强有力的MOD。具体地说,我们首先设计一个时间性多尺度集成模块,以充分利用 RGB 曝光时间和更大间隔的事件框架。然后我们引入双向电离聚变组合模块,以仔细校准和结合现有多式网络特性。为了评估我们的网络的性能,我们仔细选择和在通用的 RGB-V-VDQ 数据采集比常用的常规数据系统更好的数据系统。我们使用的系统系统系统模型的演示。</s>