Data deprivation, or the lack of easily available and actionable information on the well-being of individuals, is a significant challenge for the developing world and an impediment to the design and operationalization of policies intended to alleviate poverty. In this paper we explore the suitability of data derived from OpenStreetMap to proxy for the location of two crucial public services: schools and health clinics. Thanks to the efforts of thousands of digital humanitarians, online mapping repositories such as OpenStreetMap contain millions of records on buildings and other structures, delineating both their location and often their use. Unfortunately much of this data is locked in complex, unstructured text rendering it seemingly unsuitable for classifying schools or clinics. We apply a scalable, unsupervised learning method to unlabeled OpenStreetMap building data to extract the location of schools and health clinics in ten countries in Africa. We find the topic modeling approach greatly improves performance versus reliance on structured keys alone. We validate our results by comparing schools and clinics identified by our OSM method versus those identified by the WHO, and describe OSM coverage gaps more broadly.
翻译:缺乏关于个人福祉的易于获取和可操作的数据,或缺乏关于个人福祉的易于获取和可操作的信息,是发展中世界面临的一项重大挑战,阻碍了旨在减轻贫困的政策的设计和实施。在本文件中,我们探讨了从OpenStreetMap获得的数据是否适合作为两个关键公共服务地点的替代数据:学校和诊所。由于数千名数字人道主义工作者的努力,OpenStreetMap等在线绘图库含有数百万关于建筑物和其他结构的记录,标明了它们的位置,并经常使用这些数据。不幸的是,这些数据大部分被锁在复杂、无结构的文本中,使得它似乎不适合对学校或诊所进行分类。我们采用了一种可扩展的、不受监督的学习方法来建立无标签的OpenStreeMap数据来提取非洲十个国家的学校和诊所的位置。我们发现,主题建模方法大大改进了业绩和仅依靠结构钥匙的情况。我们通过比较我们的OSM方法所查明的学校和诊所与卫生组织所查明的学校和诊所,来验证我们的结果。