Co-clustering is a data mining technique used to extract the underlying block structure between the rows and columns of a data matrix. Many approaches have been studied and have shown their capacity to extract such structures in continuous, binary or contingency tables. However, very little work has been done to perform co-clustering on mixed type data. In this article, we extend the latent block models based co-clustering to the case of mixed data (continuous and binary variables). We then evaluate the effectiveness of the proposed approach on simulated data and we discuss its advantages and potential limits.
翻译:联合集群是一种数据挖掘技术,用于在数据矩阵的行和列之间提取基本块结构,已经研究了许多方法,并表明它们有能力在连续的、二进制的或应急的表格中提取这种结构,然而,在对混合型数据进行联合集群方面所做的工作很少,在本条中,我们将基于共同组合的潜在块模型扩大到混合型数据(连续的和二进制的变量),然后我们评估模拟数据的拟议方法的有效性,我们讨论其优点和潜在限度。