改进的堆叠式集成方法用于心脏病预测 (An Improved Heart Disease Prediction Using Stacked Ensemble Method)

from arxiv, 14 pages, 5 figures and submitted to Springer Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering

Heart disorder has just overtaken cancer as the world's biggest cause of mortality. Several cardiac failures, heart disease mortality, and diagnostic costs can all be reduced with early identification and treatment. Medical data is collected in large quantities by the healthcare industry, but it is not well mined. The discovery of previously unknown patterns and connections in this information can help with an improved decision when it comes to forecasting heart disorder risk. In the proposed study, we constructed an ML-based diagnostic system for heart illness forecasting, using a heart disorder dataset. We used data preprocessing techniques like outlier detection and removal, checking and removing missing entries, feature normalization, cross-validation, nine classification algorithms like RF, MLP, KNN, ETC, XGB, SVC, ADB, DT, and GBM, and eight classifier measuring performance metrics like ramification accuracy, precision, F1 score, specificity, ROC, sensitivity, log-loss, and Matthews' correlation coefficient, as well as eight classification performance evaluations. Our method can easily differentiate between people who have cardiac disease and those are normal. Receiver optimistic curves and also the region under the curves were determined by every classifier. Most of the classifiers, pretreatment strategies, validation methods, and performance assessment metrics for classification models have been discussed in this study. The performance of the proposed scheme has been confirmed, utilizing all of its capabilities. In this work, the impact of clinical decision support systems was evaluated using a stacked ensemble approach that included these nine algorithms

翻译：心脏疾病是全球死亡率最高的疾病，已经超过了癌症。早期识别和治疗可以减少多种心脏疾病、心脏病死亡和诊断成本。医疗行业收集了大量的医疗数据，但是这些数据并没有得到充分挖掘。在这些信息中发现以前未知的模式和联系可以帮助更好地预测心脏疾病风险。在本研究中，我们使用心脏病数据集构建了一个基于机器学习的诊断系统，用于预测心脏疾病。我们采用了异常值检测和去除、检查并删除缺失条目、特征归一化、交叉验证、9种分类算法，如RF、MLP、KNN、ETC、XGB、SVC、ADB、DT和GBM，以及8种分类器评估性能度量，如分支精度、精确度、F1分数、特异度、ROC、灵敏度、对数损失和马修斯相关系数等，以及8种分类性能评估。我们的方法可以轻松区分心脏疾病和正常人。接收器优化曲线和曲线下面积由每个分类器确定。本研究讨论了大多数分类器、预处理策略、验证方法和分类模型的性能评估指标。利用所有这些能力，验证了所提出方案的性能。本文利用堆叠式集成方法进行了临床决策支持系统的影响评估。该方法包括这九个算法。