Model selection in latent block models has been a challenging but important task in the field of statistics. Specifically, a major challenge is encountered when constructing a test on a block structure obtained by applying a specific clustering algorithm to a finite size matrix. In this case, it becomes crucial to consider the selective bias in the block structure, that is, the block structure is selected from all the possible cluster memberships based on some criterion by the clustering algorithm. To cope with this problem, this study provides a selective inference method for latent block models. Specifically, we construct a statistical test on a set of row and column cluster memberships of a latent block model, which is given by a squared residue minimization algorithm. The proposed test, by its nature, includes and thus can also be used as the test on the set of row and column cluster numbers. We also propose an approximated version of the test based on simulated annealing to avoid combinatorial explosion in searching the optimal block structure. The results show that the proposed exact and approximated tests work effectively, compared to the naive test that did not take the selective bias into account.
翻译:潜在区块模型的选择是统计领域一项具有挑战性但重要的任务。 具体地说, 在对有限体积矩阵应用特定组群算法对一个区块结构进行测试时, 遇到一个重大挑战。 在这种情况下, 关键是要考虑块块结构中的选择性偏差, 即块结构是根据群群算算法的某些标准从所有可能的组群成员中挑选出来的。 为了解决这个问题, 本研究为潜在区块模型提供了有选择的推断方法。 具体地说, 我们用一组行和列组群组成一个潜在区块模型的统计测试, 由平方残余最小化算法提供。 拟议的测试就其性质而言, 包括并因此也可以用作一组行和列组群数的测试。 我们还提出一个基于模拟射线的大致试验版本, 以避免在搜索最佳区块结构时发生组合爆炸。 结果显示, 与没有考虑到选择性偏差的天性测试相比, 拟议的精确和估计的测试是有效的。