Many works in the recent literature introduce semantic mapping methods that use CNNs (Convolutional Neural Networks) to recognize semantic properties in images. The types of properties (eg.: room size, place category, and objects) and their classes (eg.: kitchen and bathroom, for place category) are usually predefined and restricted to a specific task. Thus, all the visual data acquired and processed during the construction of the maps are lost and only the recognized semantic properties remain on the maps. In contrast, this work introduces a topological semantic mapping method that uses deep visual features extracted by a CNN (GoogLeNet), from 2D images captured in multiple views of the environment as the robot operates, to create, through averages, consolidated representations of the visual features acquired in the regions covered by each topological node. These representations allow flexible recognition of semantic properties of the regions and use in other visual tasks. Experiments with a real-world indoor dataset showed that the method is able to consolidate the visual features of regions and use them to recognize objects and place categories as semantic properties, and to indicate the topological location of images, with very promising results.
翻译:近期文献中的许多作品都引入了使用CNN(进化神经网络)来识别图像中的语义特征的语义绘图方法。性质类型(如:房间大小、位置类别和对象)及其类别(如:厨房和卫生间、地点类别)通常是预先定义的,并限于特定任务。因此,在绘制地图期间获得和处理的所有视觉数据都丢失了,只有公认的语义特性仍留在地图上。与此相反,这项工作引入了一种地形语义绘图方法,该方法使用CNN(GoogLeNet)从机器人操作时环境的多视图中采集的2D图像的深视特征,通过平均值,对每个语义节覆盖的区域所获取的视觉特征进行综合描述。这些表达方式允许灵活地识别各区域的语义特性,并在其他视觉任务中使用。与真实世界室内数据集的实验表明,该方法能够整合各区域的视觉特征,并使用它们来识别作为语义属性的物体和位置类别,并用非常有希望的结果来显示图像的地形位置。