Clustering algorithms are pivotal in data analysis, enabling the organization of data into meaningful groups. However, individual clustering methods often exhibit inherent limitations and biases, preventing the development of a universal solution applicable to diverse datasets. To address these challenges, we introduce a robust clustering framework that integrates the Minimum Description Length (MDL) principle with a genetic optimization algorithm. The framework begins with an ensemble clustering approach to generate an initial clustering solution, which is then refined using MDL-guided evaluation functions and optimized through a genetic algorithm. This integration allows the method to adapt to the dataset's intrinsic properties, minimizing dependency on the initial clustering input and ensuring a data-driven, robust clustering process. We evaluated the proposed method on thirteen benchmark datasets using four established validation metrics: accuracy, normalized mutual information (NMI), Fisher score, and adjusted Rand index (ARI). Experimental results demonstrate that our approach consistently outperforms traditional clustering methods, yielding higher accuracy, improved stability, and reduced bias. The methods adaptability makes it effective across datasets with diverse characteristics, highlighting its potential as a versatile and reliable tool for complex clustering tasks. By combining the MDL principle with genetic optimization, this study offers a significant advancement in clustering methodology, addressing key limitations and delivering superior performance in varied applications.
翻译:暂无翻译