In this paper, we propose a Monocular 3D Single Stage object Detector (M3DSSD) with feature alignment and asymmetric non-local attention. Current anchor-based monocular 3D object detection methods suffer from feature mismatching. To overcome this, we propose a two-step feature alignment approach. In the first step, the shape alignment is performed to enable the receptive field of the feature map to focus on the pre-defined anchors with high confidence scores. In the second step, the center alignment is used to align the features at 2D/3D centers. Further, it is often difficult to learn global information and capture long-range relationships, which are important for the depth prediction of objects. Therefore, we propose a novel asymmetric non-local attention block with multi-scale sampling to extract depth-wise features. The proposed M3DSSD achieves significantly better performance than the monocular 3D object detection methods on the KITTI dataset, in both 3D object detection and bird's eye view tasks.
翻译:在本文中,我们建议使用具有特征对齐和不对称非局部关注的单级三维单级物体探测器(M3DSSD)。当前基于锚的单视三维物体探测方法因特征不匹配而受到影响。为了克服这一点,我们建议了两步特征对齐方法。在第一步,进行形状对齐是为了让特征地图的接收字段能够以高置信分侧重于预先定义的锚。在第二步,中心对齐用于对齐 2D/3D 中心的特征。此外,通常很难学习全球信息并捕捉对深度天体预测十分重要的远程关系。因此,我们提出了一个新的非局部不对称注意块,采用多尺度取样来提取深度特征。提议的M3DSDSD在3D 对象探测和鸟类眼视任务中,都取得了大大优于立心3D物体探测方法的性能。