Multispectral pedestrian detection is an important task for many around-the-clock applications, since the visible and thermal modalities can provide complementary information especially under low light conditions. To reduce the influence of hand-designed components in available multispectral pedestrian detectors, we propose a MultiSpectral pedestrian DEtection TRansformer (MS-DETR), which extends deformable DETR to multi-modal paradigm. In order to facilitate the multi-modal learning process, a Reference box Constrained Cross-Attention (RCCA) module is firstly introduced to the multi-modal Transformer decoder, which takes fusion branch together with the reference boxes as intermediaries to enable the interaction of visible and thermal modalities. To further balance the contribution of different modalities, we design a modality-balanced optimization strategy, which aligns the slots of decoders by adaptively adjusting the instance-level weight of three branches. Our end-to-end MS-DETR shows superior performance on the challenging KAIST and CVC-14 benchmark datasets.
翻译:多光谱行人探测是许多全天候应用的重要任务,因为可见的热能模式可以提供补充信息,特别是在低光条件下。为了减少现有多光谱行人探测器中手工设计的部件的影响,我们建议采用多频谱行人探测仪(MS-DETR),将可变形的DETR扩展为多模式模式模式。为了便利多模式学习进程,将一个参考箱连接的交叉跟踪模块(RCCA)首先引入多模式变换器解码器模块,该模块与参考盒一起作为中介进行聚合,使可见和热模式的互动得以实现。为了进一步平衡不同模式的贡献,我们设计了一种模式平衡优化战略,通过调整三个分支的试度重量来调整脱形方位。我们的端对端MS-DETR显示挑战性KAIST和C-14基准数据集的优异性表现。