Autonomous driving demands accurate perception and safe decision-making. To achieve this, automated vehicles are now equipped with multiple sensors (e.g., camera, Lidar, etc.), enabling them to exploit complementary environmental context by fusing data from different sensing modalities. With the success of Deep Convolutional Neural Network(DCNN), the fusion between DCNNs has been proved as a promising strategy to achieve satisfactory perception accuracy. However, mainstream existing DCNN fusion schemes conduct fusion by directly element-wisely adding feature maps extracted from different modalities together at various stages, failing to consider whether the features being fused are matched or not. Therefore, we first propose a feature disparity metric to quantitatively measure the degree of feature disparity between the feature maps being fused. We then propose Fusion-filter as a feature-matching techniques to tackle the feature-mismatching issue. We also propose a Layer-sharing technique in the deep layer that can achieve better accuracy with less computational overhead. Together with the help of the feature disparity to be an additional loss, our proposed technologies enable DCNN to learn corresponding feature maps with similar characteristics and complementary visual context from different modalities to achieve better accuracy. Experimental results demonstrate that our proposed fusion technique can achieve better accuracy on KITTI dataset with less computational resources demand.
翻译:为实现这一目标,自动化车辆现在配备了多个传感器(例如相机、利达尔等),以便利用不同遥感模式的数据,从而利用互补的环境环境。随着深革命神经网络(DCNNN)的成功,DCNNN的融合被证明是实现令人满意的认知准确性的有希望的战略。然而,将现有的DCNNN融合计划纳入主流,直接以元素明智的方式添加在不同阶段从不同模式中提取的地貌图,从而进行融合,不考虑所连接的特征是否匹配。因此,我们首先提出一个特征差异衡量标准,以量化地测量所连接的地貌差异程度。我们然后提议将Fusion-filter作为地貌匹配技术,以解决地貌匹配问题。我们还提议在深层层采用一种共享技术,通过较少计算间接费用来提高准确性。此外,我们提出的技术使得地貌差异导致额外损失,使得DCNNM能够学习具有类似特性的对应地貌图,并用互补的视觉环境来测量所连接的地貌差异程度。我们提议采用的方法来改进对KIT的计算,从而实现更好的精确性。