A vast amount of location information exists in unstructured texts, such as social media posts, news stories, scientific articles, web pages, travel blogs, and historical archives. Geoparsing refers to the process of recognizing location references from texts and identifying their geospatial representations. While geoparsing can benefit many domains, a summary of the specific applications is still missing. Further, there lacks a comprehensive review and comparison of existing approaches for location reference recognition, which is the first and a core step of geoparsing. To fill these research gaps, this review first summarizes seven typical application domains of geoparsing: geographic information retrieval, disaster management, disease surveillance, traffic management, spatial humanities, tourism management, and crime management. We then review existing approaches for location reference recognition by categorizing these approaches into four groups based on their underlying functional principle: rule-based, gazetteer matching-based, statistical learning-based, and hybrid approaches. Next, we thoroughly evaluate the correctness and computational efficiency of the 27 most widely used approaches for location reference recognition based on 26 public datasets with different types of texts (e.g., social media posts and news stories) containing 39,736 location references across the world. Results from this thorough evaluation can help inform future methodological developments for location reference recognition, and can help guide the selection of proper approaches based on application needs.
翻译:为填补这些研究空白,本审查首先总结了地理分隔的七个典型应用领域:地理信息检索、灾害管理、疾病监测、交通管理、空间人文学、旅游管理和犯罪管理。然后,我们根据基于规则、地名录匹配、统计学习和混合方法的基本功能原则,将这些方法分为四组,从而审查现有的地点参考识别方法,其中含有基于规则的、基于地名录的匹配、基于统计学习的和基于混合的方法。然后,我们彻底评估27种最广泛使用的定位识别方法的正确性和计算效率,这些方法基于26种具有不同类型文本的公共数据集(例如,社会媒体站和新闻报道、空间人文学、旅游管理和犯罪管理)。