Object detection with multimodal inputs can improve many safety-critical perception systems such as autonomous vehicles (AVs). Motivated by AVs that operate in both day and night, we study multimodal object detection with RGB and thermal cameras, since the latter can provide much stronger object signatures under poor illumination. We explore strategies for fusing information from different modalities. Our key contribution is a non-learned late-fusion method that fuses together bounding box detections from different modalities via a simple probabilistic model derived from first principles. Our simple approach, which we call Bayesian Fusion, is readily derived from conditional independence assumptions across different modalities. We apply our approach to benchmarks containing both aligned (KAIST) and unaligned (FLIR) multimodal sensor data. Our Bayesian Fusion outperforms prior work by more than 13% in relative performance.
翻译:利用多式投入物探测物体可以改进许多安全临界感知系统,如自主飞行器。在白天和夜间运行的AV驱动下,我们用RGB和热摄像头研究多式物体探测,因为后者在光度低的情况下可以提供更强的物体信号。我们探索不同方式的引信信息策略。我们的主要贡献是一种未获得的迟聚方法,它通过一种源自最初原则的简单概率模型,将箱检测从不同模式中捆绑在一起。我们称之为Bayesian Fusion的简单方法很容易从不同方式的有条件独立假设中衍生而来。我们用我们的方法对包含匹配(KAIST)和不匹配(FLIR)多式传感器数据的基准进行了应用。我们的Bayesian熔化在相对性能方面比先前的工作高出13%以上。