The precise localization of 3D objects from a single image without depth information is a highly challenging problem. Most existing methods adopt the same approach for all objects regardless of their diverse distributions, leading to limited performance for truncated objects. In this paper, we propose a flexible framework for monocular 3D object detection which explicitly decouples the truncated objects and adaptively combines multiple approaches for object depth estimation. Specifically, we decouple the edge of the feature map for predicting long-tail truncated objects so that the optimization of normal objects is not influenced. Furthermore, we formulate the object depth estimation as an uncertainty-guided ensemble of directly regressed object depth and solved depths from different groups of keypoints. Experiments demonstrate that our method outperforms the state-of-the-art method by relatively 27\% for the moderate level and 30\% for the hard level in the test set of KITTI benchmark while maintaining real-time efficiency. Code will be available at \url{https://github.com/zhangyp15/MonoFlex}.
翻译:在一个没有深度信息的单一图像中精确定位 3D 对象是一个极具挑战性的问题。 大多数现有方法对所有对象都采用相同的方法,而不管其分布如何,导致截断对象的性能有限。 在本文件中,我们提议一个灵活的单外形 3D 对象探测框架,明确分离短径天体,并适应性地结合多种目标深度估计方法。具体地说,我们分离地貌图的边缘,以预测长尾断线天体,从而不影响正常对象的优化。此外,我们将对象深度估计作为直接反向天体深度的不确定性制导共合体,并解决不同关键点组的深度。实验表明,我们的方法在中度和中度测试基准中硬级中,优于27 ⁇,硬级则优于30 ⁇,同时保持实时效率。代码将在\url{https://github.com/zhengyp15/MonFlex}提供。