An increasing amount of research is being devoted to applying machine learning methods to electronic health record (EHR) data for various clinical purposes. This growing area of research has exposed the challenges of the accessibility of EHRs. MIMIC is a popular, public, and free EHR dataset in a raw format that has been used in numerous studies. The absence of standardized pre-processing steps can be, however, a significant barrier to the wider adoption of this rare resource. Additionally, this absence can reduce the reproducibility of the developed tools and limit the ability to compare the results among similar studies. In this work, we provide a greatly customizable pipeline to extract, clean, and pre-process the data available in the fourth version of the MIMIC dataset (MIMIC-IV). The pipeline also presents an end-to-end wizard-like package supporting predictive model creations and evaluations. The pipeline covers a range of clinical prediction tasks which can be broadly classified into four categories - readmission, length of stay, mortality, and phenotype prediction. The tool is publicly available at https://github.com/healthylaife/MIMIC-IV-Data-Pipeline.
翻译:越来越多的研究致力于将机器学习方法应用于电子健康记录(EHR)数据,用于各种临床目的;这一日益扩大的研究领域暴露了获得EHR数据的挑战;MIMIC是一个广受欢迎的、公众的和免费的EHR数据集,其原始格式为许多研究所用的原始格式;然而,没有标准化的预处理步骤,可能是更广泛地采用这一稀有资源的一大障碍;此外,这种缺乏可减少开发工具的可复制性,并限制对类似研究之间结果进行比较的能力;在这项工作中,我们提供了可大量定制的管道,以提取、清理和预处理MIMIC数据集第四版(MIMIC-IV)中的数据。输油管还提出了一个终端至终端巫术类的包,以支持预测模型的创建和评价。输油管包括一系列临床预测任务,可广泛分为四类——再传、停留时间、死亡率和苯型预测。该工具可在https://github.com/hetylaife/MIIC-IV-Data-Pipe公开查阅。