In this paper, we present an information-theoretic method for clustering mixed-type data, that is, data consisting of both continuous and categorical variables. The proposed approach is built on the deterministic variant of the Information Bottleneck algorithm, designed to optimally compress data while preserving its relevant structural information. We evaluate the performance of our method against four well-established clustering techniques for mixed-type data -- KAMILA, K-Prototypes, Factor Analysis for Mixed Data with K-Means, and Partitioning Around Medoids using Gower's dissimilarity -- using both simulated and real-world datasets. The results highlight that the proposed approach offers a competitive alternative to traditional clustering techniques, particularly under specific conditions where heterogeneity in data poses significant challenges.
翻译:暂无翻译