Poverty maps are essential tools for governments and NGOs to track socioeconomic changes and adequately allocate infrastructure and services in places in need. Sensor and online crowd-sourced data combined with machine learning methods have provided a recent breakthrough in poverty map inference. However, these methods do not capture local wealth fluctuations, and are not optimized to produce accountable results that guarantee accurate predictions to all sub-populations. Here, we propose a pipeline of machine learning models to infer the mean and standard deviation of wealth across multiple geographically clustered populated places, and illustrate their performance in Sierra Leone and Uganda. These models leverage seven independent and freely available feature sources based on satellite images, and metadata collected via online crowd-sourcing and social media. Our models show that combined metadata features are the best predictors of wealth in rural areas, outperforming image-based models, which are the best for predicting the highest wealth quintiles. Our results recover the local mean and variation of wealth, and correctly capture the positive yet non-monotonous correlation between them. We further demonstrate the capabilities and limitations of model transfer across countries and the effects of data recency and other biases. Our methodology provides open tools to build towards more transparent and interpretable models to help governments and NGOs to make informed decisions based on data availability, urbanization level, and poverty thresholds.
翻译:贫穷地图是政府和非政府组织追踪社会经济变化和适当分配需要的地方的基础设施和服务的基本工具。传感器和在线人群源数据与机器学习方法相结合的在线人群源数据提供了最近在贫穷图推算方面的突破。然而,这些方法并不反映当地财富的波动,也没有优化以产生问责结果,保证准确预测所有亚群体。在这里,我们建议了一套机器学习模型,用以推断财富在多个地理集群人口居住地区之间的平均和标准偏差,并展示它们在塞拉利昂和乌干达的表现。这些模型利用了七个独立和自由提供的基于卫星图像的特质来源,以及通过在线人群采购和社会媒体收集的元数据。我们的模型显示,综合元数据特征是农村地区财富的最佳预测因素,优于基于图像的模式,这是预测最高财富五分层的最佳办法。我们的成果是恢复当地平均和财富差异,正确捕捉到它们之间正反但非共性的相互关系。我们进一步展示了各国间模式转让的能力和局限性,以及数据对应性和其他偏差的影响。我们采用的方法显示,综合元数据特征是农村地区财富的最佳预测因素,优劣的元模型是预测因素。我们采用的方法为更透明地、更透明地解释了贫穷程度的非政府组织提供了数据标准。