量化居民众包中空间信息欠报差异问题 (Quantifying Spatial Under-reporting Disparities in Resident Crowdsourcing)

Modern city governance relies heavily on crowdsourcing ("co-production") to identify problems such as downed trees and power lines. A major concern is that residents do not report problems at the same rates, with reporting heterogeneity directly translating to downstream disparities in how quickly incidents can be addressed. Measuring such under-reporting is a difficult statistical task, as, by definition, we do not observe incidents that are not reported or when reported incidents first occurred. Thus, low reporting rates and low ground-truth incident rates cannot be naively distinguished. We develop a method to identify (heterogeneous) reporting rates, without using external ground truth data. Our insight is that rates on $\textit{duplicate}$ reports about the same incident can be leveraged to disambiguate whether an incident has occurred with its reporting rate once it has occurred. Using this idea, we reduce the question to a standard Poisson rate estimation task -- even though the full incident reporting interval is also unobserved. We apply our method to over 100,000 resident reports made to the New York City Department of Parks and Recreation and to over 900,000 reports made to the Chicago Department of Transportation and Department of Water Management, finding that there are substantial spatial disparities in reporting rates even after controlling for incident characteristics -- some neighborhoods report three times as quickly as do others. These spatial disparities correspond to socio-economic characteristics: in NYC, higher population density, fraction of people with college degrees, income, and fraction of population that is White all positively correlate with reporting rates.

翻译：现代城市治理在很大程度上依赖于众包（“共同生产”），用于识别树木倒伏和停电等问题。一个主要问题是居民不会以相同的速度报告问题，报告异质性直接转化为在处理事件的速度上游固有的差异。测量这种欠报是一个困难的统计任务，因为根据定义，我们不能观察到未被报告的事件，也不能观察到报告事件第一次发生的时间。因此，不能轻易区分报告率低和事实发生率低的情况。我们开发了一种方法，用于在不使用外部基准数据的情况下识别（异质的）报告率。我们的见解是，关于相同事件的$\textit{副本}$报告的比率可以被利用来区分事件是否已经发生，并且它们一旦发生了，就可以用它来估计其报告率。利用这个想法，我们将问题转化为一个标准的泊松速率估计任务——即使完整的事件报告间隔也未被观察到。我们将该方法应用于纽约市公园和娱乐部门的超过100,000份居民报告以及芝加哥市交通和水务管理部门的超过900,000份报告，发现即使在控制事件特征后，报告率也存在相当大的空间差异——一些社区报告速度比其他社区要快三倍。这些空间差异对应于社会经济特征——在纽约，高人口密度、大学学位人口占比、收入和白人人口占比都与报告率呈正相关。