This study concentrates on clustering problems and aims to find compact clusters that are informative regarding the outcome variable. The main goal is partitioning data points so that observations in each cluster are similar and the outcome variable can be predicated using these clusters simultaneously. We model this semi-supervised clustering problem as a multi-objective optimization problem with considering deviation of data points in clusters and prediction error of the outcome variable as two objective functions to be minimized. For finding optimal clustering solutions, we employ a non-dominated sorting genetic algorithm II approach and local regression is applied as prediction method for the output variable. For comparing the performance of the proposed model, we compute seven models using five real-world data sets. Furthermore, we investigate the impact of using local regression for predicting the outcome variable in all models, and examine the performance of the multi-objective models compared to single-objective models.
翻译:这项研究集中研究组群问题,目的是找到对结果变量具有丰富信息的紧凑组群。主要目标是分割数据点,使每个组群的观测结果相似,结果变量可以同时使用这些组群进行预测。我们把半监督组群问题作为多目标优化问题模型,将数据点在组群中的偏差和结果变量的预测误差作为应尽量缩小的两个目标功能。为了找到最佳组群解决方案,我们采用了非主要分类的遗传算法II方法,并将地方回归作为产出变量的预测方法。为了比较拟议模型的性能,我们用五个真实世界数据集计算了七个模型。此外,我们调查使用本地回归法预测所有模型的结果变量的影响,并对照单一目标模型审查多目标模型的性能。