Advances in ICT4D and data science facilitate systematic, reproducible, and scalable data cleaning for strengthening routine health information systems. A logic model for data cleaning was used and it included an algorithm for screening, diagnosis, and editing datasets in a rule-based, interactive, and semi-automated manner. Apriori computational workflows and operational definitions were prepared. Model performance was illustrated using the dengue line-list of the National Vector Borne Disease Control Programme, Punjab, India from 01 January 2015 to 31 December 2019. Cleaning and imputation for an estimated date were successful for 96.1% and 98.9% records for the year 2015 and 2016 respectively, and for all cases in the year 2017, 2018, and 2019. Information for age and sex was cleaned and extracted for more than 98.4% and 99.4% records. The logic model application resulted in the development of an analysis-ready dataset that can be used to understand spatiotemporal epidemiology and facilitate data-based public health decision making.
翻译:信通技术4D和数据科学的进步促进了系统、可复制和可扩缩的数据清理,以加强常规卫生信息系统。使用了一个数据清理逻辑模型,其中包括以基于规则、互动和半自动的方式筛选、诊断和编辑数据集的算法。制作了优先的计算工作流程和业务定义;使用印度旁遮普省国家病媒疾病控制方案的登革线列表演示了模型性能;2015年1月1日至2019年12月31日期间,2015年和2016年分别96.1%和98.9%的记录以及2017年、2018年和2019年的所有病例的清理和提取了年龄和性别信息,超过98.4%和99.4%的记录。逻辑模型应用的结果是开发了一套可用于理解短期流行病学和促进基于数据的公共卫生决策的分析准备数据集。