利用多式数据通过贫穷图推论解释财富分配 (Interpreting wealth distribution via poverty map inference using multimodal data)

Poverty maps are essential tools for governments and NGOs to track socioeconomic changes and adequately allocate infrastructure and services in places in need. Sensor and online crowd-sourced data combined with machine learning methods have provided a recent breakthrough in poverty map inference. However, these methods do not capture local wealth fluctuations, and are not optimized to produce accountable results that guarantee accurate predictions to all sub-populations. Here, we propose a pipeline of machine learning models to infer the mean and standard deviation of wealth across multiple geographically clustered populated places, and illustrate their performance in Sierra Leone and Uganda. These models leverage seven independent and freely available feature sources based on satellite images, and metadata collected via online crowd-sourcing and social media. Our models show that combined metadata features are the best predictors of wealth in rural areas, outperforming image-based models, which are the best for predicting the highest wealth quintiles. Our results recover the local mean and variation of wealth, and correctly capture the positive yet non-monotonous correlation between them. We further demonstrate the capabilities and limitations of model transfer across countries and the effects of data recency and other biases. Our methodology provides open tools to build towards more transparent and interpretable models to help governments and NGOs to make informed decisions based on data availability, urbanization level, and poverty thresholds.

翻译：贫穷地图是政府和非政府组织追踪社会经济变化和适当分配需要的地方的基础设施和服务的基本工具。传感器和在线人群源数据与机器学习方法相结合的在线人群源数据提供了最近在贫穷图推算方面的突破。然而,这些方法并不反映当地财富的波动,也没有优化以产生问责结果,保证准确预测所有亚群体。在这里,我们建议了一套机器学习模型,用以推断财富在多个地理集群人口居住地区之间的平均和标准偏差,并展示它们在塞拉利昂和乌干达的表现。这些模型利用了七个独立和自由提供的基于卫星图像的特质来源,以及通过在线人群采购和社会媒体收集的元数据。我们的模型显示,综合元数据特征是农村地区财富的最佳预测因素,优于基于图像的模式,这是预测最高财富五分层的最佳办法。我们的成果是恢复当地平均和财富差异,正确捕捉到它们之间正反但非共性的相互关系。我们进一步展示了各国间模式转让的能力和局限性,以及数据对应性和其他偏差的影响。我们采用的方法显示,综合元数据特征是农村地区财富的最佳预测因素,优劣的元模型是预测因素。我们采用的方法为更透明地、更透明地解释了贫穷程度的非政府组织提供了数据标准。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【CIKM2019 Tutorial】Recent Developments of Deep Heterogeneous Information Network Analysis（深度异构信息网络分析的最新进展），附157页PDF免费下载

专知会员服务

29+阅读 · 2019年11月3日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日