Multimodal learning, particularly for pedestrian detection, has recently received emphasis due to its capability to function equally well in several critical autonomous driving scenarios such as low-light, night-time, and adverse weather conditions. However, in most cases, the training distribution largely emphasizes the contribution of one specific input that makes the network biased towards one modality. Hence, the generalization of such models becomes a significant problem where the non-dominant input modality during training could be contributing more to the course of inference. Here, we introduce a novel training setup with regularizer in the multimodal architecture to resolve the problem of this disparity between the modalities. Specifically, our regularizer term helps to make the feature fusion method more robust by considering both the feature extractors equivalently important during the training to extract the multimodal distribution which is referred to as removing the imbalance problem. Furthermore, our decoupling concept of output stream helps the detection task by sharing the spatial sensitive information mutually. Extensive experiments of the proposed method on KAIST and UTokyo datasets shows improvement of the respective state-of-the-art performance.
翻译:最近,由于在诸如低光、夜间和恶劣天气条件等一些关键的自主驾驶情景下,多式学习,特别是行人探测,最近受到重视,因为其能够同样顺利地运作,但是,在多数情况下,培训分布在很大程度上强调使网络偏向于一种模式的一种具体投入的贡献,因此,这种模式的概括化成为一个重大问题,因为培训期间的非主要投入模式可以对推断过程作出更大的贡献。在这里,我们引入了一种新型的培训装置,在多式联运结构中采用正规化装置,以解决不同模式之间的这种差异问题。具体地说,我们的常规化术语有助于使特征融合方法更加稳健,因为它考虑到在培训期间具有同等重要性的特征提取器,以获取多式联运分布方式,被称为消除不平衡问题。此外,我们输出流的脱钩概念通过共享空间敏感信息帮助探测任务。关于KAIST和Utokyo数据集的拟议方法的大规模实验表明,各自的状态表现有所改进。</s>