While electronic health records are a rich data source for biomedical research, these systems are not implemented uniformly across healthcare settings and significant data may be missing due to healthcare fragmentation and lack of interoperability between siloed electronic health records. Considering that the deletion of cases with missing data may introduce severe bias in the subsequent analysis, several authors prefer applying a multiple imputation strategy to recover the missing information. Unfortunately, although several literature works have documented promising results by using any of the different multiple imputation algorithms that are now freely available for research, there is no consensus on which MI algorithm works best. Beside the choice of the MI strategy, the choice of the imputation algorithm and its application settings are also both crucial and challenging. In this paper, inspired by the seminal works of Rubin and van Buuren, we propose a methodological framework that may be applied to evaluate and compare several multiple imputation techniques, with the aim to choose the most valid for computing inferences in a clinical research work. Our framework has been applied to validate, and extend on a larger cohort, the results we presented in a previous literature study, where we evaluated the influence of crucial patients' descriptors and COVID-19 severity in patients with type 2 diabetes mellitus whose data is provided by the National COVID Cohort Collaborative Enclave.
翻译:虽然电子健康记录是生物医学研究的丰富数据来源,但这些系统并不是在医疗保健环境中统一实施,而且由于医疗保健支离破碎以及分散的电子健康记录之间缺乏互操作性,可能缺少大量数据。考虑到删除缺少数据的案件可能会在随后的分析中造成严重偏差,一些作者倾向于采用多重估算战略来恢复缺失的信息。不幸的是,虽然一些文献著作记录了有希望的结果,使用了目前可自由用于研究的不同多种估算算法,但对于哪些MI算法最有效没有共识。除了MI战略的选择外,估算算法的选择及其应用设置也是关键和具有挑战性的。在本文中,在Rubin和van Buuren的开创性作品的启发下,我们提出了一个方法框架,可用于评估和比较多种估算技术,目的是选择在临床研究工作中计算推断的最有效方法。我们的框架被用于验证,并在较大组群中扩展了我们以前在文献研究中介绍的结果,我们评估了关键病人的估算算法及其应用了CVI型CVI数据的严重程度。