The latent block model is used to simultaneously rank the rows and columns of a matrix to reveal a block structure. The algorithms used for estimation are often time consuming. However, recent work shows that the log-likelihood ratios are equivalent under the complete and observed (with unknown labels) models and the groups posterior distribution to converge as the size of the data increases to a Dirac mass located at the actual groups configuration. Based on these observations, the algorithm $Largest$ $Gaps$ is proposed in this paper to perform clustering using only the marginals of the matrix, when the number of blocks is very small with respect to the size of the whole matrix in the case of binary data. In addition, a model selection method is incorporated with a proof of its consistency. Thus, this paper shows that studying simplistic configurations (few blocks compared to the size of the matrix or very contrasting blocks) with complex algorithms is useless since the marginals already give very good parameter and classification estimates.
翻译:潜在区块模型用于同时排列矩阵的行和列,以显示区块结构。用于估算的算法往往耗时。然而,最近的工作表明,在完整和观察的(标签不明)模型和组后座分布组下,日志类比在完整和观察的(标签不明)模型和组后座分布中是等效的,因为数据大小增加为位于实际组群配置中的Dirac质量。根据这些观察,本文件建议采用以美元为单位的算法,仅使用矩阵边际进行分组,因为区块数量相对于二进制数据整体矩阵的大小而言非常小。此外,模型选择方法也结合了一致性的证明。因此,本文表明,用复杂的算法研究简单化的配置(区块相对于矩阵大小或非常对比的区块)是无用的,因为边际参数和分类估计已经非常好。</s>