Existing top-performance 3D object detectors typically rely on the multi-modal fusion strategy. This design is however fundamentally restricted due to overlooking the modality-specific useful information and finally hampering the model performance. To address this limitation, in this work we introduce a novel modality interaction strategy where individual per-modality representations are learned and maintained throughout for enabling their unique characteristics to be exploited during object detection. To realize this proposed strategy, we design a DeepInteraction architecture characterized by a multi-modal representational interaction encoder and a multi-modal predictive interaction decoder. Experiments on the large-scale nuScenes dataset show that our proposed method surpasses all prior arts often by a large margin. Crucially, our method is ranked at the first position at the highly competitive nuScenes object detection leaderboard.
翻译:现有的顶级性能 3D 物体探测器通常依靠多式组合战略。 但是,由于忽略了特定模式的有用信息,最终阻碍了模型性能,这一设计受到根本的限制。为了解决这一局限性,我们在此工作中引入了一种新的模式互动战略,即学习并始终保持个人按时制表现方式,以便能够在物体探测过程中利用它们的独特特征。为了实现这一拟议战略,我们设计了一个以多式代表互动编码器和多式预测性互动解密器为特征的“深度互动架构”。对大型核星数据集的实验表明,我们的拟议方法往往大大超越了以往的所有艺术。关键是,我们的方法位于竞争激烈的核星物体探测引导板的第一位置。