We propose a Visual Teach and Repeat (VTR) algorithm using semantic landmarks extracted from environmental objects for ground robots with fixed mount monocular cameras. The proposed algorithm is robust to changes in the starting pose of the camera/robot, where a pose is defined as the planar position plus the orientation around the vertical axis. VTR consists of a teach phase in which a robot moves in a prescribed path, and a repeat phase in which the robot tries to repeat the same path starting from the same or a different pose. Most available VTR algorithms are pose dependent and cannot perform well in the repeat phase when starting from an initial pose far from that of the teach phase. To achieve more robust pose independency, the key is to generate a 3D semantic map of the environment containing the camera trajectory and the positions of surrounding objects during the teach phase. For specific implementation, we use ORB-SLAM to collect the camera poses and the 3D point clouds of the environment, and YOLOv3 to detect objects in the environment. We then combine the two outputs to build the semantic map. In the repeat phase, we relocalize the robot based on the detected objects and the stored semantic map. The robot is then able to move toward the teach path, and repeat it in both forward and backward directions. We have tested the proposed algorithm in different scenarios and compared it with two most relevant recent studies. Also, we compared our algorithm with two image-based relocalization methods. One is purely based on ORB-SLAM and the other combines Superglue and RANSAC. The results show that our algorithm is much more robust with respect to pose variations as well as environmental alterations. Our code and data are available at the following Github page: https://github.com/mmahdavian/semantic_visual_teach_repeat.
翻译:我们建议使用从环境物体中提取的语义标志进行视觉教学和重复( VTR) 算法, 用于使用固定的单色相机的地面机器人。 提议的算法对相机/ robot 的初始面貌变化具有很强的功能。 图像/ robot 的起始面貌被定义为平面位置以及垂直轴周围方向。 VTR 包含一个教学阶段, 机器人在指定路径中移动, 以及一个重复阶段, 机器人试图重复从相同或不同姿势开始的同一路径。 大多数可用的VTR 算法都具有依赖性, 并且无法在重复的阶段里运行。 为了实现更强的对立面, 关键是生成包含相机轨迹位置的3D 语义地图/ robt, 在教学阶段里, 我们使用 OR- SLAM 来收集相机的配置和3D点云, 在环境中检测对象。 我们用 YOLOVOV3 来将两个输出结果合并到构建语义地图。 在重复的阶段里程中, 我们用磁路图向前向前的路径, 我们用磁路向前的变变变变的变的变的变。 我们用两个变的变的变, 和变的变的变的变的变的变的变的变的变的变的变的变的算法, 和变的变的变的变的变的变的变的变的变的变的变的变的变的变的变的变的变的变的变的变的变的变的变的变的变的变的变的变的变的变的变的变。