Anonymized smartphone-based mobility data has been widely adopted in devising and evaluating COVID-19 response strategies such as the targeting of public health resources. Yet little attention has been paid to measurement validity and demographic bias, due in part to the lack of documentation about which users are represented as well as the challenge of obtaining ground truth data on unique visits and demographics. We illustrate how linking large-scale administrative data can enable auditing mobility data for bias in the absence of demographic information and ground truth labels. More precisely, we show that linking voter roll data -- containing individual-level voter turnout for specific voting locations along with race and age -- can facilitate the construction of rigorous bias and reliability tests. These tests illuminate a sampling bias that is particularly noteworthy in the pandemic context: older and non-white voters are less likely to be captured by mobility data. We show that allocating public health resources based on such mobility data could disproportionately harm high-risk elderly and minority groups.
翻译:在制定和评价COVID-19应对战略(如针对公共卫生资源)时,广泛采用了匿名智能手机流动数据,但很少注意衡量有效性和人口偏差,部分原因是缺乏关于用户代表的文件,以及难以获得关于独特访问和人口统计的地面真相数据。我们说明大规模行政数据如何可以将流动数据与缺乏人口信息和地面真相标签的偏差进行审计。更确切地说,我们表明,将选民名册数据 -- -- 包括特定投票地点的个人投票率以及种族和年龄 -- -- 联系起来,有助于构建严格的偏差和可靠性测试。这些测试揭示了在大流行病背景下特别值得注意的抽样偏差:流动数据不太可能捕捉到老年和非白人选民。我们表明,根据这种流动数据分配公共卫生资源可能会对高风险老年人和少数群体造成极大伤害。