Data analytics has tremendous potential to provide targeted benefit in low-resource communities, however the availability of high-quality public health data is a significant challenge in developing countries primarily due to non-diligent data collection by community health workers (CHWs). In this work, we define and test a data collection diligence score. This challenging unlabeled data problem is handled by building upon domain expert's guidance to design a useful data representation of the raw data, using which we design a simple and natural score. An important aspect of the score is relative scoring of the CHWs, which implicitly takes into account the context of the local area. The data is also clustered and interpreting these clusters provides a natural explanation of the past behavior of each data collector. We further predict the diligence score for future time steps. Our framework has been validated on the ground using observations by the field monitors of our partner NGO in India. Beyond the successful field test, our work is in the final stages of deployment in the state of Rajasthan, India.
翻译:数据分析具有巨大的潜力,可以在低资源社区提供有针对性的惠益,然而,高质量公共卫生数据的提供在发展中国家是一项重大挑战,这主要是因为社区卫生工作者收集的数据没有疏漏。在这项工作中,我们界定并测试了数据收集的勤奋分数。这个具有挑战性的数据问题是通过利用领域专家的指导意见来解决的,以设计有用的原始数据数据数据表示,我们用它来设计一个简单自然的评分。得分的一个重要方面是对比性评分,其中隐含地考虑到当地的情况。这些数据也集中在一起,解释这些组别,自然地解释了每个数据收集员过去的行为。我们进一步预测了未来时间步骤的勤奋分数。我们的框架已经通过我们在印度的非政府组织伙伴的实地监测员的观察在实地得到验证。除了成功的实地测试外,我们的工作还处于印度拉贾斯坦邦部署的最后阶段。