基于机器学习方法预测代谢功能障碍相关脂肪性肝病 (Predicting Metabolic Dysfunction-Associated Steatotic Liver Disease using Machine Learning Methods)

Background: Metabolic Dysfunction-Associated Steatotic Liver Disease (MASLD) affects ~33% of U.S. adults and is the most common chronic liver disease. Although often asymptomatic, progression can lead to cirrhosis. Early detection is important, as lifestyle interventions can prevent disease progression. We developed a fair, rigorous, and reproducible MASLD prediction model and compared it to prior methods using a large electronic health record database. Methods: We evaluated LASSO logistic regression, random forest, XGBoost, and a neural network for MASLD prediction using clinical feature subsets, including the top 10 SHAP-ranked features. To reduce disparities in true positive rates across racial and ethnic subgroups, we applied an equal opportunity postprocessing method. Results: This study included 59,492 patients in the training data, 24,198 in the validating data, and 25,188 in the testing data. The LASSO logistic regression model with the top 10 features was selected for its interpretability and comparable performance. Before fairness adjustment, the model achieved AUROC of 0.84, accuracy of 78%, sensitivity of 72%, specificity of 79%, and F1-score of 0.617. After equal opportunity postprocessing, accuracy modestly increased to 81% and specificity to 94%, while sensitivity decreased to 41% and F1-score to 0.515, reflecting the fairness trade-off. Conclusions: We developed the MASER prediction model (MASLD Static EHR Risk Prediction), a LASSO logistic regression model which achieved competitive performance for MASLD prediction (AUROC 0.836, accuracy 77.6%), comparable to previously reported ensemble and tree-based models. Overall, this approach demonstrates that interpretable models can achieve a balance of predictive performance and fairness in diverse patient populations.

翻译：背景：代谢功能障碍相关脂肪性肝病（MASLD）影响约33%的美国成年人，是最常见的慢性肝病。该病通常无症状，但进展可导致肝硬化。早期检测至关重要，因为生活方式干预可以阻止疾病进展。我们开发了一个公平、严谨且可复现的MASLD预测模型，并使用大型电子健康记录数据库将其与先前方法进行了比较。方法：我们评估了LASSO逻辑回归、随机森林、XGBoost和神经网络在MASLD预测中的表现，使用了包括SHAP排名前10特征在内的临床特征子集。为减少不同种族和民族亚组间真阳性率的差异，我们应用了机会均等后处理方法。结果：本研究训练数据包含59,492名患者，验证数据包含24,198名患者，测试数据包含25,188名患者。选择具有前10个特征的LASSO逻辑回归模型，因其可解释性和可比性能。在公平性调整前，该模型的AUROC为0.84，准确率为78%，敏感性为72%，特异性为79%，F1分数为0.617。经过机会均等后处理后，准确率小幅提升至81%，特异性提升至94%，而敏感性降至41%，F1分数降至0.515，这反映了公平性权衡。结论：我们开发了MASER预测模型（MASLD静态电子健康记录风险预测），这是一个LASSO逻辑回归模型，在MASLD预测中取得了有竞争力的性能（AUROC 0.836，准确率77.6%），与先前报道的集成模型和基于树的模型相当。总体而言，该方法表明可解释模型能够在不同患者群体中实现预测性能与公平性的平衡。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日