This paper presents a clustering algorithm that is an extension of the Category Trees algorithm. Category Trees is a clustering method that creates tree structures that branch on category type and not feature. The development in this paper is to consider a secondary order of clustering that is not the category to which the data row belongs, but the tree, representing a single classifier, that it is eventually clustered with. Each tree branches to store subsets of other categories, but the rows in those subsets may also be related. This paper is therefore concerned with looking at that second level of clustering between the other category subsets, to try to determine if there is any consistency over it. It is argued that Principal Components may be a related and reciprocal type of structure, and there is an even bigger question about the relation between exemplars and principal components, in general. The theory is demonstrated using the Portugal Forest Fires dataset as a case study. The distributed nature of that dataset can artificially create the tree categories and the output criterion can also be determined in an automatic and arbitrary way, leading to a flexible and dynamic clustering mechanism.
翻译:本文展示了一种组群算法, 这是分类树算法的延伸 。 分类树是一种组群方法, 创建树结构, 在分类类型上分支而不是特性上分支。 本文的开发是为了考虑一个次要的组群顺序, 它不是数据行所属的类别, 而是代表一个单一分类器的树, 它最终被组合在一起。 每个树枝都存储其他类别的子集, 但是这些子集中的行也可能是相关的。 因此, 本文关注的是查看其他类别子集群之间的第二层组群, 以便试图确定它是否具有一致性 。 有人争辩说, 主构件可能是相关和对应的结构类型, 一般来说, 关于外观和主构件之间关系的问题更大。 理论用葡萄牙森林火灾数据集作为案例研究得到证明。 该数据集的分布性质可以人为地创建树类, 产出标准也可以以自动和任意的方式确定, 导致一个灵活和动态的群集机制 。