Monocular 3D object detection is a fundamental but very important task to many applications including autonomous driving, robotic grasping and augmented reality. Existing leading methods tend to estimate the depth of the input image first, and detect the 3D object based on point cloud. This routine suffers from the inherent gap between depth estimation and object detection. Besides, the prediction error accumulation would also affect the performance. In this paper, a novel method named MonoPCNS is proposed. The insight behind introducing MonoPCNS is that we propose to simulate the feature learning behavior of a point cloud based detector for monocular detector during the training period. Hence, during inference period, the learned features and prediction would be similar to the point cloud based detector as possible. To achieve it, we propose one scene-level simulation module, one RoI-level simulation module and one response-level simulation module, which are progressively used for the detector's full feature learning and prediction pipeline. We apply our method to the famous M3D-RPN detector and CaDDN detector, conducting extensive experiments on KITTI and Waymo Open dataset. Results show that our method consistently improves the performance of different monocular detectors for a large margin without changing their network architectures. Our method finally achieves state-of-the-art performance.
翻译:单体 3D 对象探测是许多应用的基本但非常重要的任务, 包括自主驱动、 机器人捕捉和扩大现实。 现有的引导方法倾向于首先估计输入图像的深度, 并根据点云探测 3D 对象 。 这一例行工作存在深度估计和天体探测之间的内在差距 。 此外, 预测错误积累也会影响性能 。 在本文中, 提出了一个名为 MonoPCNS 的新方法 。 引入 MonoPCNS 背后的洞察力是, 我们提议在培训期间模拟基于点云探测器的特征学习行为, 用于单体探测器 。 因此, 在推断期间, 所学到的特征和预测将类似于基于点云的探测器 。 为了实现这一点, 我们提议了一个场景级模拟模块, 一个 RoI 级模拟模块和一个反应级别模拟模块, 将逐渐用于探测器的全部特征学习和预测管道。 我们用我们的方法模拟了著名的 M3D- RPN 探测器和 CADDN 探测器的特征,, 并在 KITTI 和 Waymo Open data set 数据设置上进行广泛的实验。 结果显示, 我们的方法将持续改进了我们不同状态网络的状态探测器的性能探测器的性能结构结构结构, 。 。 最终改进了我们的方法将最终的性能 。