Despite the explosion of interest in healthcare AI research, the reproducibility and benchmarking of those research works are often limited due to the lack of standard benchmark datasets and diverse evaluation metrics. To address this reproducibility challenge, we develop PyHealth, an open-source Python toolbox for developing various predictive models on healthcare data. PyHealth consists of data preprocessing module, predictive modeling module, and evaluation module. The target users of PyHealth are both computer science researchers and healthcare data scientists. With PyHealth, they can conduct complex machine learning pipelines on healthcare datasets with fewer than ten lines of code. The data preprocessing module enables the transformation of complex healthcare datasets such as longitudinal electronic health records, medical images, continuous signals (e.g., electrocardiogram), and clinical notes into machine learning friendly formats. The predictive modeling module provides more than 30 machine learning models, including established ensemble trees and deep neural network-based approaches, via a unified but extendable API designed for both researchers and practitioners. The evaluation module provides various evaluation strategies (e.g., cross-validation and train-validation-test split) and predictive model metrics. With robustness and scalability in mind, best practices such as unit testing, continuous integration, code coverage, and interactive examples are introduced in the library's development. PyHealth can be installed through the Python Package Index (PyPI) or https://github.com/yzhao062/PyHealth .
翻译:尽管人们对保健AI研究的兴趣激增,但这些研究工作的可复制性和基准化往往有限,因为缺乏标准基准数据集和不同的评价指标。为了应对这种可复制的挑战,我们开发了PyHealth,这是一个开放源代码的Python工具箱,用于开发各种保健数据的预测模型;PyHealth由数据预处理模块、预测模型模块和评价模块组成。PyHealth的目标用户既是计算机科学研究者和保健数据科学家;PyHealth公司,他们可以在保健数据集上进行复杂的机器学习管道,其代码小于十行。数据预处理模块能够转换复杂的保健数据集,如长纵向电子健康记录、医疗图像、连续信号(例如电动心电图)和临床说明,将其纳入机器学习友好格式。预测模型提供了30多个机器学习模型,包括已经建立的元素树和深神经网络化方法。通过为研究人员和从业人员设计的统一但可扩展的API,它们可以提供各种评估战略(e.g.reality cloveal-travical acal decilal exisal decilation) asureal destration exal exalalalaltraview exal 和Scidudududududustration exislation exaldaldaldaldaldaldaldaldalizaldaldaldaldaldaldaldaldaldaldalizalizaldaldaldaldaliz。评价模块提供了各种模型,可以提供可使用,在智能化的模型化算制成为可持续的模型化的模型化的模型,在智能化的模型化的模型化和智能化的模型化的模型化和智能化的模型化和智能化的模型化的模型,在智能化的模型化的模型化的模型化的模型,在模型,在智能化的模型化的模型化和智能化的模型化的模型化的模型化制成为可持续的模型,在模型化的模型化的模型化的模型化的模型化的模型化的模型化的模型化的模型化的模型化的模型化的模型化的模型化。