IP geolocation - the process of mapping network identifiers to physical locations - has myriad applications. We examine a large collection of snapshots from a popular geolocation database and take a first look at its longitudinal properties. We define metrics of IP geo-persistence, prevalence, coverage, and movement, and analyse 10 years of geolocation data at different location granularities. Across different classes of IP addresses, we find that significant location differences can exist even between successive instances of the database - a previously underappreciated source of potential error when using geolocation data: 47% of end users IP addresses move by more than 40 km in 2019. To assess the sensitivity of research results to the instance of the geo database, we reproduce prior research that depended on geolocation lookups. In this case study, which analyses geolocation database performance on routers, we demonstrate impact of these temporal effects: median distance from ground truth shifted from 167 km to 40 km when using a two months apart snapshot. Based on our findings, we make recommendations for best practices when using geolocation databases in order to best encourage reproducibility and sound measurement.
翻译:IP 地理定位- 将网络标识符绘制到物理位置的过程- IP 地理定位- 具有多种应用。 我们检查了来自广受欢迎的地理定位数据库的大量快照,并首先查看了其纵向特性。 我们定义了IP 地理持久性、流行程度、覆盖面和移动的尺度,并分析了不同位置颗粒的10年地理定位数据。 在IP 地址的不同类别中,我们发现,即使数据库的连续几例之间也可能存在巨大的位置差异 — 使用地理定位数据时可能发生误差的一个过去未得到充分认识的来源: 47%的终端用户IP 地址在2019年移动40公里以上。 为了评估研究成果对地理数据库的敏感性,我们复制了以前依赖地理定位调查的研究。 在分析地理定位数据库在路由器上的性能的案例研究中,我们展示了这些时间效应的影响: 从地面真相的中位距离从167公里转移到40公里,使用两个月的快照。 根据我们的调查结果,我们建议在使用地理定位数据库时采用最佳做法,以便最好地鼓励重复和正确测量。