多光谱行人探测多谱段宽集成信息增强和交叉模式注意特征聚合,以进行多光谱行人探测 (Cascaded information enhancement and cross-modal attention feature fusion for multispectral pedestrian detection)

Multispectral pedestrian detection is a technology designed to detect and locate pedestrians in Color and Thermal images, which has been widely used in automatic driving, video surveillance, etc. So far most available multispectral pedestrian detection algorithms only achieved limited success in pedestrian detection because of the lacking take into account the confusion of pedestrian information and background noise in Color and Thermal images. Here we propose a multispectral pedestrian detection algorithm, which mainly consists of a cascaded information enhancement module and a cross-modal attention feature fusion module. On the one hand, the cascaded information enhancement module adopts the channel and spatial attention mechanism to perform attention weighting on the features fused by the cascaded feature fusion block. Moreover, it multiplies the single-modal features with the attention weight element by element to enhance the pedestrian features in the single-modal and thus suppress the interference from the background. On the other hand, the cross-modal attention feature fusion module mines the features of both Color and Thermal modalities to complement each other, then the global features are constructed by adding the cross-modal complemented features element by element, which are attentionally weighted to achieve the effective fusion of the two modal features. Finally, the fused features are input into the detection head to detect and locate pedestrians. Extensive experiments have been performed on two improved versions of annotations (sanitized annotations and paired annotations) of the public dataset KAIST. The experimental results show that our method demonstrates a lower pedestrian miss rate and more accurate pedestrian detection boxes compared to the comparison method. Additionally, the ablation experiment also proved the effectiveness of each module designed in this paper.

翻译：多光谱行人探测是一种技术,旨在探测和定位在彩色和热色图像中行人,这种技术在自动驾驶、视频监视等方面得到广泛使用。迄今为止,大多数多光谱行人探测算法在行人探测方面只取得了有限的成功,因为缺乏考虑到行人信息的混乱以及彩色和热色图像中的背景噪音。这里我们建议采用多光谱行人探测算法,主要包括一个级联的信息增强模块和一个跨模式关注聚合模块。一方面,级联信息增强模块采用频道和空间关注机制,对由级联地特征集成的特征进行注意加权。此外,它通过元素将单一模式的特征与关注权重元素相加,但因元素的注意重度要素却有限。此外,它通过元素使单光谱特征与关注权重元素相加,以加强单模式和热图像中的行人信息,从而抑制背景干扰。另一方面,交叉式关注模式将彩色和热度模式的特征作为补充,然后通过添加跨模式的特征要素来进行交叉补充,这些要素是注意的加权的,从而实现准确度比错测测测度。最后,通过元素,用元素使每个行标的单路路路路路路标的比,从而显示两次测试结果的精确测测测测测测测结果。