In this paper, we formulate the problem of predicting a geolocation from free text as a sequence-to-sequence problem. Using this formulation, we obtain a geocoding model by training a T5 encoder-decoder transformer model using free text as an input and geolocation as an output. The geocoding model was trained on geo-tagged wikidump data with adaptive cell partitioning for the geolocation representation. All of the code including Rest-based application, dataset and model checkpoints used in this work are publicly available.
翻译:在本文中,我们将从自由文本中预测地理位置的问题作为一个顺序到顺序的问题来阐述。使用这一公式,我们通过培训T5编码器-编码器变异器模型,将自由文本作为一种输入,将地理定位作为一种输出。地理编码模型接受了地理标记的维基数据培训,为地理定位代表提供了适应性细胞分隔。所有代码,包括基于休息的应用、数据集和这项工作中使用的模范检查站,都可以公开查阅。