Monocular 3D object detection is a promising research topic for the intelligent perception systems of autonomous driving. In this work, a single-stage keypoint-based network, named as FADNet, is presented to address the task of monocular 3D object detection. In contrast to previous keypoint-based methods which adopt identical layouts for output branches, we propose to divide the output modalities into different groups according to the estimating difficulty, whereby different groups are treated differently by sequential feature association. Another contribution of this work is the strategy of depth hint augmentation. To provide characterized depth patterns as hints for depth estimation, a dedicated depth hint module is designed to generate row-wise features named as depth hints, which are explicitly supervised in a bin-wise manner. In the training stage, the regression outputs are uniformly encoded to enable loss disentanglement. The 2D loss term is further adapted to be depth-aware for improving the detection accuracy of small objects. The contributions of this work are validated by conducting experiments and ablation study on the KITTI benchmark. Without utilizing depth priors, post optimization, or other refinement modules, our network performs competitively against state-of-the-art methods while maintaining a decent running speed.
翻译:单体 3D 对象探测是自主驾驶智能感知系统的一个很有希望的研究课题。 在这项工作中,一个名为 FADNet 的单阶段关键点网络被展示为处理单体 3D 对象探测任务。 与以前对输出分支采用相同布局的基于关键点的方法相比,我们建议根据估计难度将产出模式分为不同组,不同组因相继特征关联而得到不同处理。 这项工作的另一个贡献是深度提示增强战略。 为了提供深度显示的深度模式作为深度估计提示,专门设计了一个深度提示模块,以生成以深度提示命名的分行特征,这些特征以双向方式明确监督。 在培训阶段,回归输出被统一编码,以促成损失脱钩。 2D 损失术语进一步调整为深度认知,以提高小物体的探测精度。 这项工作的贡献通过对KITTI基准的实验和对比研究得到验证。 在不使用深度前期、 后优化或其他精细模块的情况下,我们的网络在运行中具有竞争力,同时保持正态速度。