Representation learning of spatial and geographic data is a rapidly developing field which allows for similarity detection between areas and high-quality inference using deep neural networks. Past approaches however concentrated on embedding raster imagery (maps, street or satellite photos), mobility data or road networks. In this paper we propose the first approach to learning vector representations of OpenStreetMap regions with respect to urban functions and land-use in a micro-region grid. We identify a subset of OSM tags related to major characteristics of land-use, building and urban region functions, types of water, green or other natural areas. Through manual verification of tagging quality, we selected 36 cities were for training region representations. Uber's H3 index was used to divide the cities into hexagons, and OSM tags were aggregated for each hexagon. We propose the hex2vec method based on the Skip-gram model with negative sampling. The resulting vector representations showcase semantic structures of the map characteristics, similar to ones found in vector-based language models. We also present insights from region similarity detection in six Polish cities and propose a region typology obtained through agglomerative clustering.
翻译:空间和地理数据的代表性学习是一个迅速发展的领域,它使得能够利用深神经网络在各地区和高质量推断中发现相似之处,而过去的方法集中于嵌入光栅图像(地图、街道或卫星照片)、移动数据或公路网络。本文件提出了关于城市功能和微观区域网中土地利用的OpenStreetMap区域矢量的学习方式的第一个方法。我们确定了与土地使用、建筑和城市区域功能、水、绿色或其他自然区域的主要特点、水的类型、绿色或其他自然区域有关的OSM标签子集。我们通过人工核查标记质量,选择了36个城市作为区域代表。Uber的H3指数用于将城市分为六边形,对每个六边形进行了汇总。我们提出了基于带有负面抽样的Sppkid-gram模型的Exx2c方法。由此形成的矢量代表展示了地图特征的语性结构,类似于基于病媒语言模型的特征。我们还介绍了在六个波兰城市进行区域类似检测的洞察的洞察,并提出了通过一个群集获得的区域类型。