The use of crowdsourced data has been finding practical use for enhancing situational awareness during disasters. While recent studies have shown promising results regarding the potential of crowdsourced data for flood mapping, little attention has been paid to data imbalances issues that could introduce biases. We examine biases present in crowdsourced reports to identify data imbalances with a goal of improving disaster situational awareness. Sample bias, spatial bias, and demographic bias are examined as we analyzed reported flooding from 3-1-1, Waze reports, and FEMA damage data collected in the aftermaths of Tropical Storm Imelda in 2019 and Hurricane Ida in 2021. Integrating other flooding related topics from 3-1-1 reports into the Global Moran's I and Local Indicator of Spatial Association (LISA) test revealed more communities that were impacted by floods. To examine spatial bias, we perform the LISA and BI-LISA tests on the three datasets at the census tract and census block group level. By looking at two geographical aggregations, we found that the larger spatial aggregations, census tracts, show less data imbalance in the results. Finally, one-way analysis of Variance (ANOVA) test performed on the clusters generated from the BI-LISA shows that data imbalance exists in areas where minority populations reside. Through a regression analysis, we found that 3-1-1 and Waze reports have data imbalance limitations in areas where minority populations reside. The findings of this study advance understanding of data imbalances and biases in crowdsourced datasets that are growingly used for disaster situational awareness.
翻译:虽然最近的研究显示,在洪水绘图方面,多方源数据的潜力方面,对数据不平衡问题没有多少注意。我们检查了多方源报告中存在的偏差,以查明数据不平衡,目的是提高灾害情情意识。在我们分析2019年热带风暴伊梅尔达和2021年飓风伊达之后收集的洪水破坏数据时,我们从3-1、Waze和FEMA报告中对报告洪水的抽样偏差、空间偏差和人口偏差进行了研究。将3-1号报告中与洪水有关的其他专题纳入全球莫伦第一和当地空间协会指标(LISA)测试,发现更多的社区受到洪水影响。为了检查空间偏差,我们在普查道和普查区对三个数据集进行了LISA和BI-LISA的测试。通过两个地理汇总,我们发现更大的空间汇总、普查通道、显示结果中的数据偏差较少。最后,对全球摩伦第一和当地空间协会(LISA)的测试揭示了更多的社区受洪水影响的社区。我们通过这一数据偏差状况的一线分析(ANOVA) 数据测试显示,在少数地区中的数据偏差区的数据显示,通过这些数据显示,我们通过该组数据分析发现,通过该组中的数据显示该组中的数据偏差区的数据显示,通过该组的地理偏差分析发现,在地理区的数据显示了数据偏差区的数据偏差区的数据显示,通过该数据区域中的数据显示,通过该数据分析发现,在地理层数据偏差在地理区的数据区域中的数据显示,通过该数据分析得出了该组的数据偏差。