The geolocation of online information is an essential component in any geospatial application. While most of the previous work on geolocation has focused on Twitter, in this paper we quantify and compare the performance of text-based geolocation methods on social media data drawn from both Blogger and Twitter. We introduce a novel set of location specific features that are both highly informative and easily interpretable, and show that we can achieve error rate reductions of up to 12.5% with respect to the best previously proposed geolocation features. We also show that despite posting longer text, Blogger users are significantly harder to geolocate than Twitter users. Additionally, we investigate the effect of training and testing on different media (cross-media predictions), or combining multiple social media sources (multi-media predictions). Finally, we explore the geolocability of social media in relation to three user dimensions: state, gender, and industry.
翻译:在线信息的地理定位是任何地理空间应用中的一个基本组成部分。 虽然以往关于地理定位的大部分工作都集中在推特上,但我们在本文中量化和比较了基于文本的地理定位方法在来自博客和推特的社交媒体数据方面的表现。 我们引入了一套新型的位置特征,这些特征信息信息丰富且易于解释,并表明我们可以在先前最佳的拟议地理定位功能方面实现12.5%的误差率降低。 我们还表明,尽管张贴的文本较长,但博客用户比推特用户更难地理定位。 此外,我们调查培训和测试对不同媒体的影响(跨媒体预测),或者将多种社交媒体来源(多媒体预测)相结合。 最后,我们探索社会媒体在三个用户层面的地理可变性:状态、性别和产业。