Owing to the advantages of increased accuracy and the potential to detect unseen patterns, provided by data mining techniques they have been widely incorporated for standard classification problems. They have often been used for high precision disease prediction in the medical field, and several hybrid prediction models capable of achieving high accuracies have been proposed. Though this stands true most of the previous models fail to efficiently address the recurring issue of bad data quality which plagues most high dimensional data, and especially proves troublesome in the highly sensitive medical data. This work proposes a robust self healing (RSH) hybrid prediction model which functions by using the data in its entirety by removing errors and inconsistencies from it rather than discarding any data. Initial processing involves data preparation followed by cleansing or scrubbing through context-dependent attribute correction, which ensures that there is no significant loss of relevant information before the feature selection and prediction phases. An ensemble of heterogeneous classifiers, subjected to local boosting, is utilized to build the prediction model and genetic algorithm based wrapper feature selection technique wrapped on the respective classifiers is employed to select the corresponding optimal set of features, which warrant higher accuracy. The proposed method is compared with some of the existing high performing models and the results are analyzed.
翻译:由于数据挖掘技术提供的提高准确性的好处和探测不可见模式的潜力,这些数据被广泛纳入用于标准分类问题,这些技术被广泛用于医疗领域的高精确度疾病预测,并提出了若干能够实现高准确度的混合预测模型。虽然大多数以前模型都是正确的,但未能有效地解决经常出现的数据质量差的问题,这些问题困扰着大多数高维数据,尤其证明高度敏感的医疗数据存在麻烦。这项工作提出了一种强大的自我治愈(RSH)混合预测模型,该模型通过从中消除数据的全部错误和不一致而发挥作用,而不是抛弃任何数据。初步处理涉及在数据准备之后进行清理或清理,然后通过根据具体情况对属性进行校正,从而确保在特征选择和预测阶段之前不会大量丧失相关信息。受当地推动的混合分类器的集合器被用来建立预测模型和遗传算法,根据包装在各分类者身上的包装特征选择技术来选择相应的最佳特征,这需要更高的准确性。拟议的方法与一些高性模型进行比较,并分析结果。