This paper aims to design a 3D object detection model from 2D images taken by monocular cameras by combining the estimated bird's-eye view elevation map and the deep representation of object features. The proposed model has a pre-trained ResNet-50 network as its backend network and three more branches. The model first builds a bird's-eye view elevation map to estimate the depth of the object in the scene and by using that estimates the object's 3D bounding boxes. We have trained and evaluate it on two major datasets: a syntactic dataset and the KIITI dataset.
翻译:本文的目的是将估计的鸟眼视野高地图和物体特征的深度表示组合在一起,从用单筒照相机拍摄的2D图像中设计一个3D物体探测模型。拟议模型有一个预先训练的ResNet-50网络作为其后端网络和另外三个分支。模型首先建立一个鸟眼图像高地图,以估计物体在现场的深度,并使用该图来估计物体的3D捆绑框。我们培训和评价了两个主要数据集:合成数据集和KIITI数据集。