Machine learning has been widely used in healthcare applications to approximate complex models, for clinical diagnosis, prognosis, and treatment. As deep learning has the outstanding ability to extract information from time series, its true capabilities on sparse, irregularly sampled, multivariate, and imbalanced physiological data are not yet fully explored. In this paper, we systematically examine the performance of machine learning models for the clinical prediction task based on the EHR, especially physiological time series. We choose Physionet 2019 challenge public dataset to predict Sepsis outcomes in ICU units. Ten baseline machine learning models are compared, including 3 deep learning methods and 7 non-deep learning methods, commonly used in the clinical prediction domain. Nine evaluation metrics with specific clinical implications are used to assess the performance of models. Besides, we sub-sample training dataset sizes and use learning curve fit to investigate the impact of the training dataset size on the performance of the machine learning models. We also propose the general pre-processing method for the physiology time-series data and use Dice Loss to deal with the dataset imbalanced problem. The results show that deep learning indeed outperforms non-deep learning, but with certain conditions: firstly, evaluating with some particular evaluation metrics (AUROC, AUPRC, Sensitivity, and FNR), but not others; secondly, the training dataset size is large enough (with an estimation of a magnitude of thousands).
翻译:由于深层学习具有从时间序列中提取信息的出色能力,因此尚未充分探讨其在稀少、不定期抽样、多变和不平衡的生理数据方面的真实能力。此外,我们在本文件中系统地审查基于EHR的临床预测任务的机器学习模型的性能,特别是生理时间序列。我们选择Physionet 2019挑战公共数据集,以预测ICU单位的Sepis结果。我们比较了10个基线机器学习模型,包括3个深层学习方法和7个非深层学习方法,这些方法通常用于临床预测领域。9个具有具体临床影响的评价指标被用于评估模型的性能。此外,我们还使用分抽样培训数据集的大小,并使用适合调查培训数据集大小对机器学习模型性能的影响的学习曲线。我们还提议了生理时间序列数据的一般预处理方法,并使用Dice Loss 来处理数据设置不平衡的问题。结果显示,深度评估确实超越了APR的某大范围,而没有深入地评估。