通过地址封印的采矿利益点:无人监督的方法 (Mining Points of Interest via Address Embeddings: An Unsupervised Approach)

Digital maps are commonly used across the globe for exploring places that users are interested in, commonly referred to as points of interest (PoI). In online food delivery platforms, PoIs could represent any major private compounds where customers could order from such as hospitals, residential complexes, office complexes, educational institutes and hostels. In this work, we propose an end-to-end unsupervised system design for obtaining polygon representations of PoIs (PoI polygons) from address locations and address texts. We preprocess the address texts using locality names and generate embeddings for the address texts using a deep learning-based architecture, viz. RoBERTa, trained on our internal address dataset. The PoI candidates are identified by jointly clustering the anonymised customer phone GPS locations (obtained during address onboarding) and the embeddings of the address texts. The final list of PoI polygons is obtained from these PoI candidates using novel post-processing steps. This algorithm identified 74.8 % more PoIs than those obtained using the Mummidi-Krumm baseline algorithm run on our internal dataset. The proposed algorithm achieves a median area precision of 98 %, a median area recall of 8 %, and a median F-score of 0.15. In order to improve the recall of the algorithmic polygons, we post-process them using building footprint polygons from the OpenStreetMap (OSM) database. The post-processing algorithm involves reshaping the algorithmic polygon using intersecting polygons and closed private roads from the OSM database, and accounting for intersection with public roads on the OSM database. We achieve a median area recall of 70 %, a median area precision of 69 %, and a median F-score of 0.69 on these post-processed polygons.

翻译：在全球各地,通常使用数字地图来探索用户感兴趣的地方,通常称之为感兴趣的地点。在在线食品交付平台上,POI可以代表客户可以从医院、住宅综合体、办公综合体、教育研究所和旅舍等单位订购的任何大型私人化合物。在这项工作中,我们提议采用一个端到端不受监督的系统设计,以获取地址和地址文本的PoI(PoI多边形)多边图的多边图。我们使用地点名称预处理地址文本,并使用一个深层次学习基础结构(即RoBERTA)生成地址文本的嵌入。在我们的内部内部地址数据库中,我们受过培训的就是RoBERTA。POII可以将匿名客户电话GP地点(在登机时安装)和地址文本的嵌入。这些POII(POI)多边边框的最后清单是使用新式后处理步骤从这些POI候选人那里获取的。这个算法比在内部数据库中运行的Mumidi-Krum基线算法要多74.8%。在内部数据库中,使用O-revalalal-rickal O-rass O-rassal 数据库中,使用O-rma 中,使用O-rma 的中的一个中位的中位的中位区域,从一个中将一个中位的中位数,用一个中位数区域,从98的中位。