For a medical diagnosis, health professionals use different kinds of pathological ways to make a decision for medical reports in terms of patients medical condition. In the modern era, because of the advantage of computers and technologies, one can collect data and visualize many hidden outcomes from them. Statistical machine learning algorithms based on specific problems can assist one to make decisions. Machine learning data driven algorithms can be used to validate existing methods and help researchers to suggest potential new decisions. In this paper, multiple imputation by chained equations was applied to deal with missing data, and Principal Component Analysis to reduce the dimensionality. To reveal significant findings, data visualizations were implemented. We presented and compared many binary classifier machine learning algorithms (Artificial Neural Network, Random Forest, Support Vector Machine) which were used to classify blood donors and non-blood donors with hepatitis, fibrosis and cirrhosis diseases. From the data published in UCI-MLR [1], all mentioned techniques were applied to find one better method to classify blood donors and non-blood donors (hepatitis, fibrosis, and cirrhosis) that can help health professionals in a laboratory to make better decisions. Our proposed ML-method showed better accuracy score (e.g. 98.23% for SVM). Thus, it improved the quality of classification.
翻译:为了进行医疗诊断,卫生专业人员使用不同种类的病理方法就病人的健康状况做出医疗报告的决定。在现代,由于计算机和技术的优势,人们可以收集数据和从这些计算机和技术中想象出许多隐藏的结果。基于具体问题的统计机器学习算法可以帮助人们作出决定。机器学习数据驱动算法可以用来验证现有方法,并帮助研究人员提出潜在的新决定。在本文中,用链式方程式的多重估算法处理缺失的数据,用主构件分析来减少维度。为了揭示重要的调查结果,我们实施了数据可视化。我们提出并比较了许多二元分类机学习算法(人工神经神经网络、随机森林、支持病媒机),这些算法用来对献血者和非献血者进行肝炎、纤维化和肝硬化疾病分类。从UCI-MLR[1]公布的数据中,所有提到的技术都用来找到一种更好的方法,对献血者和非献血者(血者、纤维化和肝硬化)进行分类。我们提出了许多二元分解机机学习算法,这样可以帮助实验室作出更好的健康等级决定。M.