In this paper we present a data-driven approach to obtain the static image of a scene, eliminating dynamic objects that might have been present at the time of traversing the scene with a camera. The general objective is to improve vision-based localization and mapping tasks in dynamic environments, where the presence (or absence) of different dynamic objects in different moments makes these tasks less robust. We introduce an end-to-end deep learning framework to turn images of an urban environment that include dynamic content, such as vehicles or pedestrians, into realistic static frames suitable for localization and mapping. This objective faces two main challenges: detecting the dynamic objects, and inpainting the static occluded back-ground. The first challenge is addressed by the use of a convolutional network that learns a multi-class semantic segmentation of the image. The second challenge is approached with a generative adversarial model that, taking as input the original dynamic image and the computed dynamic/static binary mask, is capable of generating the final static image. This framework makes use of two new losses, one based on image steganalysis techniques, useful to improve the inpainting quality, and another one based on ORB features, designed to enhance feature matching between real and hallucinated image regions. To validate our approach, we perform an extensive evaluation on different tasks that are affected by dynamic entities, i.e., visual odometry, place recognition and multi-view stereo, with the hallucinated images. Code has been made available on https://github.com/bertabescos/EmptyCities_SLAM.
翻译:在本文中,我们展示了一种数据驱动方法,以获取场景的静态图像,消除在用相机巡视场景时可能存在的动态物体。一般目标是改进动态环境中基于视觉的本地化和绘图任务,因为在不同时刻不同动态物体的存在(或不存在)使得这些任务不那么有力。我们引入了一个端到端的深层次学习框架,将包含动态内容的城市环境图像(如车辆或行人)转换成适合本地化和绘图的切合实际的静态框架。这个框架面临两个主要挑战:发现动态对象,对静态的隐蔽图像进行反射。第一个挑战是在动态环境中改进基于视觉的本地化定位和图象的图像定位网络,这个网络通过学习图像的多级的语系语系分布来应对第一个挑战。第二个挑战是使用一个基因对抗模型来应对第二个挑战,该模型将原始动态图像和计算动态/静态的硬面面罩作为输入,能够生成最后静态图像。这个框架使用两个新的损失,一个基于图像识别技术,一个基于静态图解图解图解图解的图象技术,用于改善图像的多层路面的图像质量/轨迹。