Object detection with multimodal inputs can improve many safety-critical systems such as autonomous vehicles (AVs). Motivated by AVs that operate in both day and night, we study multimodal object detection with RGB and thermal cameras, since the latter provides much stronger object signatures under poor illumination. We explore strategies for fusing information from different modalities. Our key contribution is a probabilistic ensembling technique, ProbEn, a simple non-learned method that fuses together detections from multi-modalities. We derive ProbEn from Bayes' rule and first principles that assume conditional independence across modalities. Through probabilistic marginalization, ProbEn elegantly handles missing modalities when detectors do not fire on the same object. Importantly, ProbEn also notably improves multimodal detection even when the conditional independence assumption does not hold, e.g., fusing outputs from other fusion methods (both off-the-shelf and trained in-house). We validate ProbEn on two benchmarks containing both aligned (KAIST) and unaligned (FLIR) multimodal images, showing that ProbEn outperforms prior work by more than 13% in relative performance!
翻译:由多式投影器(AVs)来检测多式物体,可以改进许多安全关键系统,例如自动飞行器(AVs)。受日夜运行的AV的激励,我们研究用RGB和热摄像头来检测多式物体,因为后者在低光度下提供了更强的物体信号。我们探索了不同方式的阻燃信息的战略。我们的主要贡献是一种概率组合技术,即ProbEn,这是一种简单的非学习方法,从多种方式的探测中结合。我们从Bayes的规则和第一个原则中获取ProbEn,它具有不同方式的有条件独立。我们通过概率边缘化,ProbEn优雅地处理在探测器不向同一对象开火时缺失的模式。重要的是,ProbEn还显著地改进了多式探测,即使有条件的独立假设不起作用,例如,利用其他集成方法(现成的和内部培训的)的产出。我们验证ProbEn的两种基准,既包含一致的(KAIST)又不匹配的多式图像,也显示ProbEn在相对性上超过13 %。