Existing clustering algorithms such as K-means often need to preset parameters such as the number of categories K, and such parameters may lead to the failure to output objective and consistent clustering results. This paper introduces a clustering method based on the information theory, by which clusters in the clustering result have maximum average information entropy (called entropy payload in this paper). This method can bring the following benefits: firstly, this method does not need to preset any super parameter such as category number or other similar thresholds, secondly, the clustering results have the maximum information expression efficiency. it can be used in image segmentation, object classification, etc., and could be the basis of unsupervised learning.
翻译:K 手段等现有组群算法往往需要预先设定参数, 如 K 类别的数量, 而这些参数可能导致无法产生客观和一致的组群结果。 本文根据信息理论引入了集束法, 集群结果中的组群具有最大平均信息 entropy (本文中称为 entropy 有效载荷) 。 这种方法可以带来以下好处: 首先, 这种方法不需要预先设定任何超级参数, 如 类别号 或其他类似阈值, 其次, 组合结果具有最大的信息表达效率 。 它可以用于图像分割、 对象分类等, 并且可以作为不受监督的学习基础 。