In today's world, a massive amount of data is available in almost every sector. This data has become an asset as we can use this enormous amount of data to find information. Mainly health care industry contains many data consisting of patient and disease-related information. By using the machine learning technique, we can look for hidden data patterns to predict various diseases. Recently CVDs, or cardiovascular disease, have become a leading cause of death around the world. The number of death due to CVDs is frightening. That is why many researchers are trying their best to design a predictive model that can save many lives using the data mining model. In this research, some fusion models have been constructed to diagnose CVDs along with its severity. Machine learning(ML) algorithms like artificial neural network, SVM, logistic regression, decision tree, random forest, and AdaBoost have been applied to the heart disease dataset to predict disease. Randomoversampler was implemented because of the class imbalance in multiclass classification. To improve the performance of classification, a weighted score fusion approach was taken. At first, the models were trained. After training, two algorithms' decision was combined using a weighted sum rule. A total of three fusion models have been developed from the six ML algorithms. The results were promising in the performance parameter. The proposed approach has been experimented with different test training ratios for binary and multiclass classification problems, and for both of them, the fusion models performed well. The highest accuracy for multiclass classification was found as 75%, and it was 95% for binary. The code can be found in : https://github.com/hafsa-kibria/Weighted_score_fusion_model_heart_disease_prediction
翻译:在当今世界,几乎每个部门都有大量的数据。 这些数据已经成为资产, 因为我们可以使用大量的数据来寻找信息。 主要是医疗行业包含许多包含病人和疾病相关信息的数据。 通过机器学习技术, 我们可以寻找隐藏的数据模式来预测各种疾病。 最近, CVD 或心血管疾病已成为全世界心脏病的主要死因。 由 CVD 造成的死亡数量是可怕的。 这就是为什么许多研究人员正在尽力设计一个预测性模型, 通过数据开采模型可以挽救许多生命。 在这次研究中, 某些聚合模型已经建成, 以诊断 CVD 及其严重程度。 机器学习( ML) 算法, 如人工神经网络、 SVM、 物流回归、决策树、 随机森林和 AdaBoost 已经应用到心脏病数据集中来预测疾病。 RandoverSampler之所以被实施, 是因为在多级分类中存在等级的不平衡。 为了改进分类的性能, 可以进行加权的分类方法。 首先, 已经对模型进行了诊断, 并且已经进行了精细的计算。 在测试后, 两种算法的结果是 。, 两种模型是 。 在模拟中, 模拟中, 已经找到了一种是 。 一种是 。 一种是模拟的, 模拟的, 。