The RGB complementary metal-oxidesemiconductor (CMOS) sensor works within the visible light spectrum. Therefore it is very sensitive to environmental light conditions. On the contrary, a long-wave infrared (LWIR) sensor operating in 8-14 micro meter spectral band, functions independent of visible light. In this paper, we exploit both visual and thermal perception units for robust object detection purposes. After delicate synchronization and (cross-) labeling of the FLIR [1] dataset, this multi-modal perception data passes through a convolutional neural network (CNN) to detect three critical objects on the road, namely pedestrians, bicycles, and cars. After evaluation of RGB and infrared (thermal and infrared are often used interchangeably) sensors separately, various network structures are compared to fuse the data at the feature level effectively. Our RGB-thermal (RGBT) fusion network, which takes advantage of a novel entropy-block attention module (EBAM), outperforms the state-of-the-art network [2] by 10% with 82.9% mAP.
翻译:在可见光谱范围内,RGB辅助的金属-氧化导体(CMOS)传感器在可见光谱范围内运作,因此对环境光状况非常敏感。相反,在8-14微米光谱波段运行的长波红外传感器,功能独立于可见光。在本文中,我们利用视觉和热感应装置来进行强力物体探测。在FLIR[1]数据集的微妙同步和(交叉)标签之后,这种多式感知数据通过一个动态神经网络(CNN)传送,以探测公路上的三个关键物体,即行人、自行车和汽车。在对RGB和红外(热和红外)传感器分别进行评估后,各种网络结构将可与特性水平的数据有效结合。我们的RGB-热(RGB-GBT)聚变网络利用了一个新型的酶聚注意模块(EBAM),该模块比最先进的网络[2]高出10%,其中82.9% mAP。