Biclustering is a method for detecting homogeneous submatrices in a given observed matrix, and it is an effective tool for relational data analysis. Although there are many studies that estimate the underlying bicluster structure of a matrix, few have enabled us to determine the appropriate number of biclusters in an observed matrix. Recently, a statistical test on the number of biclusters has been proposed for a regular-grid bicluster structure, where we assume that the latent bicluster structure can be represented by row-column clustering. However, when the latent bicluster structure does not satisfy such regular-grid assumption, the previous test requires a larger number of biclusters than necessary (i.e., a finer bicluster structure than necessary) for the null hypothesis to be accepted, which is not desirable in terms of interpreting the accepted bicluster structure. In this study, we propose a new statistical test on the number of biclusters that does not require the regular-grid assumption and derive the asymptotic behavior of the proposed test statistic in both null and alternative cases. We illustrate the effectiveness of the proposed method by applying it to both synthetic and practical relational data matrices.
翻译:生物集群是一种在特定观测的矩阵中检测同质子矩阵的方法,也是进行关系数据分析的有效工具。虽然有许多研究估计了一个矩阵的基本双组结构,但只有少数研究使我们能够确定观察到的矩阵中生物集群的适当数量。最近,提议为一个常规电网双组结构对双组数量进行统计测试,我们假设潜伏的双组结构可以由行-网组合来代表。然而,如果潜伏的双组结构不能满足这种常规电网假设,以前的测试需要比必要的更多的双组(即比必要的更精细的双组结构)才能被接受,这在解释公认的双组结构方面是不可取的。在本研究中,我们提议对不需要正常电网假设的双组数量进行新的统计测试,并得出拟议测试统计统计表的简单行为。我们通过将拟议方法既适用于合成数据,又适用于实际关系数据矩阵来说明拟议方法的有效性。