关系数据矩阵中双组数的 " 良好性 " 测试 (Goodness-of-fit Test on the Number of Biclusters in Relational Data Matrix)

Biclustering is a method for detecting homogeneous submatrices in a given observed matrix, and it is an effective tool for relational data analysis. Although there are many studies that estimate the underlying bicluster structure of a matrix, few have enabled us to determine the appropriate number of biclusters in an observed matrix. Recently, a statistical test on the number of biclusters has been proposed for a regular-grid bicluster structure, where we assume that the latent bicluster structure can be represented by row-column clustering. However, when the latent bicluster structure does not satisfy such regular-grid assumption, the previous test requires a larger number of biclusters than necessary (i.e., a finer bicluster structure than necessary) for the null hypothesis to be accepted, which is not desirable in terms of interpreting the accepted bicluster structure. In this study, we propose a new statistical test on the number of biclusters that does not require the regular-grid assumption and derive the asymptotic behavior of the proposed test statistic in both null and alternative cases. To develop the proposed test, we construct a consistent submatrix localization algorithm, that is, the probability that it outputs the correct bicluster structure converges to one. We illustrate the effectiveness of the proposed method by applying it to both synthetic and practical relational data matrices.

翻译：生物集群是一种在特定观测的矩阵中检测同质子矩阵的方法,也是进行关系数据分析的有效工具。虽然有许多研究估计了一个矩阵的基本双组结构,但只有少数研究使我们能够确定一个观测的矩阵中的双组结构的适当数量。最近,提议为一个常规的网络双组结构对双组数量进行统计测试,我们假设潜伏的双组结构可以由编网组合代表。然而,如果潜伏的双组结构不能满足这种常规网络假设,以前的测试需要比必要的更多的双组(即比必要的更精细的双组结构)才能被接受,这在解释公认的双组结构方面是不可取的。在本研究中,我们提议对不需要定期网络假设的双组结构进行新的统计测试,并得出拟议的单项和另两种情况下测试数据的不严谨性行为。为了发展拟议的测试,我们为接受的无效假设,我们建立了一种一致的亚组本地化的子组结构,我们用两种方法都能够将拟议的结果与合成的矩阵进行比较。