This paper addresses the problem of learning to estimate the depth of detected objects given some measurement of camera motion (e.g., from robot kinematics or vehicle odometry). We achieve this by 1) designing a recurrent neural network (DBox) that estimates the depth of objects using a generalized representation of bounding boxes and uncalibrated camera movement and 2) introducing the Object Depth via Motion and Detection Dataset (ODMD). ODMD training data are extensible and configurable, and the ODMD benchmark includes 21,600 examples across four validation and test sets. These sets include mobile robot experiments using an end-effector camera to locate objects from the YCB dataset and examples with perturbations added to camera motion or bounding box data. In addition to the ODMD benchmark, we evaluate DBox in other monocular application domains, achieving state-of-the-art results on existing driving and robotics benchmarks and estimating the depth of objects using a camera phone.
翻译:本文论述学习如何估计已探测到的物体的深度,以对相机运动进行某种测量(例如机器人动脉学或车辆odology)的问题。我们通过1 实现这一点:1 设计一个经常性神经网络(DBox),利用捆绑盒和未经校准的照相机移动的普遍表示来估计物体的深度,2 通过移动和探测数据集引入物体深度。ODMD培训数据是可扩展和可配置的,ODMD基准包括四个验证和测试组的21 600个实例,其中包括使用终端效应照相机来定位YCB数据集中的物体的移动机器人实验,以及在摄像头运动或捆绑盒数据中添加插图的例子。除了ODMD基准外,我们还评估其他单体应用领域的DBox,实现现有驾驶和机器人基准的最新结果,并用相机电话估计物体的深度。