Clinical decision support using data mining techniques offers more intelligent way to reduce the decision error in the last few years. However, clinical datasets often suffer from high missingness, which adversely impacts the quality of modelling if handled improperly. Imputing missing values provides an opportunity to resolve the issue. Conventional imputation methods adopt simple statistical analysis, such as mean imputation or discarding missing cases, which have many limitations and thus degrade the performance of learning. This study examines a series of machine learning based imputation methods and suggests an efficient approach to in preparing a good quality breast cancer (BC) dataset, to find the relationship between BC treatment and chemotherapy-related amenorrhoea, where the performance is evaluated with the accuracy of the prediction.
翻译:使用数据挖掘技术的临床决策支持在过去几年中为减少决策错误提供了更明智的方法,然而,临床数据集往往高度缺失,如果处理不当,会对建模质量产生不利影响;计算缺失值是解决问题的机会;常规估算方法采用简单的统计分析,如平均估算或丢弃失踪案例,这些案例有许多局限性,从而降低学习绩效;这项研究研究了一系列基于机器的估算方法,并提出了编制高质量的乳腺癌(BC)数据集的有效方法,以便找到不列颠哥伦比亚治疗和与化疗有关的阿美诺尔霍亚病之间的关系,在这种分析中,根据预测的准确性对绩效进行评估。