Handling missing values in training datasets for constructing learning models or extracting useful information is considered to be an important research task in data mining and knowledge discovery in databases. In recent years, lot of techniques are proposed for imputing missing values by considering attribute relationships with missing value observation and other observations of training dataset. The main deficiency of such techniques is that, they depend upon single approach and do not combine multiple approaches, that why they are less accurate. To improve the accuracy of missing values imputation, in this paper we introduce a novel partial matching concept in association rules mining, which shows better results as compared to full matching concept that we described in our previous work. Our imputation technique combines the partial matching concept in association rules with k-nearest neighbor approach. Since this is a hybrid technique, therefore its accuracy is much better than as compared to those techniques which depend upon single approach. To check the efficiency of our technique, we also provide detail experimental results on number of benchmark datasets which show better results as compared to previous approaches.
翻译:在构建学习模型或提取有用信息的培训数据集中处理缺失值被认为是数据挖掘和数据库知识发现方面的一项重要研究任务。近年来,通过考虑与缺失值观测的属性关系以及培训数据集的其他观测,为估算缺失值提出了许多技术。这些技术的主要缺陷是,它们依赖单一方法,而不是结合多种方法,因此它们为什么不那么准确。为了提高缺失值估算的准确性,本文件我们在联合规则采矿中引入了一个新的部分匹配概念,这与我们在以往工作中描述的完全匹配概念相比,显示了更好的结果。我们的估算技术将部分匹配概念与 k- nearest 邻居方法相结合。由于这是一种混合技术,因此其准确性比依赖单一方法的技术要好得多。为了检查我们技术的效率,我们还提供了与以往方法相比显示更好结果的基准数据集数量的详细实验结果。