The greed package implements the general and flexible framework of arXiv:2002.11577 for model-based clustering in the R language. Based on the direct maximization of the exact Integrated Classification Likelihood with respect to the partition, it allows jointly performing clustering and selection of the number of groups. This combinatorial problem is handled through an efficient hybrid genetic algorithm, while a final hierarchical step allows accessing coarser partitions and extract an ordering of the clusters. This methodology is applicable in a wide variety of latent variable models and, hence, can handle various data types as well as heterogeneous data. Classical models for continuous, count, categorical and graph data are implemented, and new models may be incorporated thanks to S4 class abstraction. This paper introduces the package, the design choices that guided its development and illustrates its usage on practical use-cases.
翻译:贪婪套件对R语中基于模型的组群采用arXiv:2002.11577的一般和灵活框架,在直接尽量扩大与分区有关的确切综合分类可能性的基础上,可以联合进行组群和选择组群的数目。这种组合问题通过高效混合遗传算法处理,而最后的等级步骤可以进入粗略分区并提取组群的顺序。这一方法适用于各种潜在的变异模型,因此可以处理各种数据类型以及各种数据。实施了连续、计数、绝对和图表数据等典型模型,并可以借助S4类抽象抽象数据纳入新的模型。本文介绍了一揽子方法、指导其发展的设计选择,并说明了其在实际使用情况下的用途。