机械学习风险预测模型的形成的后果:对一般病房的残缺的评价 (The Consequences of the Framing of Machine Learning Risk Prediction Models: Evaluation of Sepsis in General Wards)

Objectives: To evaluate the consequences of the framing of machine learning risk prediction models. We evaluate how framing affects model performance and model learning in four different approaches previously applied in published artificial-intelligence (AI) models. Setting and participants: We analysed structured secondary healthcare data from 221,283 citizens from four Danish municipalities who were 18 years of age or older. Results: The four models had similar population level performance (a mean area under the receiver operating characteristic curve of 0.73 to 0.82), in contrast to the mean average precision, which varied greatly from 0.007 to 0.385. Correspondingly, the percentage of missing values also varied between framing approaches. The on-clinical-demand framing, which involved samples for each time the clinicians made an early warning score assessment, showed the lowest percentage of missing values among the vital sign parameters, and this model was also able to learn more temporal dependencies than the others. The Shapley additive explanations demonstrated opposing interpretations of SpO2 in the prediction of sepsis as a consequence of differentially framed models. Conclusions: The profound consequences of framing mandate attention from clinicians and AI developers, as the understanding and reporting of framing are pivotal to the successful development and clinical implementation of future AI technology. Model framing must reflect the expected clinical environment. The importance of proper problem framing is by no means exclusive to sepsis prediction and applies to most clinical risk prediction models.

翻译：目标:评估设计机器学习风险预测模型的影响; 我们评估设计对模型业绩和模型学习的影响,以前在公布的人工智能模型中应用了四种不同方法; 设置和参与者: 我们分析了来自四个丹麦城市、18岁或18岁以上的221 283名18岁或18岁以上的221 283名公民的结构性二级保健数据; 结果:这四个模型具有类似的人口水平业绩(接收器操作特征曲线0.73至0.82下的一个平均区域),而平均精确度则大相径庭,从0.007到0.385不等; 相应的是,缺失值的百分比在设计方法之间也各不相同。临床需求框架涉及临床医生和AI开发商每次进行预警分数评估的样本,显示关键信号参数中缺失值的百分比最低,这一模型也能够了解比其他模型更多的时间依赖性。与平均平均精确度相比,平均精确度从0.007到0.385不等。相应的结论: 临床模型设计对任务关注的深刻后果,因为临床学家和AI开发商家,这是临床预测中最能理解和报告的临床预测,因此,制定正确的临床预测必须反映正确的临床预测的正确发展方式。

相关内容

MoDELS

关注 0

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/