Clustering is a fundamental problem in many areas, which aims to partition a given data set into groups based on some distance measure, such that the data points in the same group are similar while that in different groups are dissimilar. Due to its importance and NP-hardness, a lot of methods have been proposed, among which evolutionary algorithms are a class of popular ones. Evolutionary clustering has found many successful applications, but all the results are empirical, lacking theoretical support. This paper fills this gap by proving that the approximation performance of the GSEMO (a simple multi-objective evolutionary algorithm) for solving four formulations of clustering, i.e., $k$-tMM, $k$-center, discrete $k$-median and $k$-means, can be theoretically guaranteed. Furthermore, we consider clustering under fairness, which tries to avoid algorithmic bias, and has recently been an important research topic in machine learning. We prove that for discrete $k$-median clustering under individual fairness, the approximation performance of the GSEMO can be theoretically guaranteed with respect to both the objective function and the fairness constraint.
翻译:暂无翻译