Clustering bipartite graphs is a fundamental task in network analysis. In the high-dimensional regime where the number of rows $n_1$ and the number of columns $n_2$ of the associated adjacency matrix are of different order, existing methods derived from the ones used for symmetric graphs can come with sub-optimal guarantees. Due to increasing number of applications for bipartite graphs in the high dimensional regime, it is of fundamental importance to design optimal algorithms for this setting. The recent work of Ndaoud et al (2022) improves the existing upper-bound for the misclustering rate in the special case where the columns (resp. rows) can be partitioned into $L = 2$ (resp. $K = 2$) communities. Unfortunately, their algorithm cannot be extended to the more general setting where $K \neq L \geq 2$. We overcome this limitation by introducing a new algorithm based on the power method. We derive conditions for exact recovery in the general setting where $K \neq L \geq 2$, and show that it recovers the result in Ndaoud et al (2022). We also derive a minimax lower bound on the misclustering error when $ K = L = 2$, which matches the corresponding upper bound up to a constant factor.
翻译:在高维系统中,当相关相邻矩阵的行数为n_1美元和列数为$_2美元时,从对称图所使用的对称图中得出的现有方法可以带来亚最佳保证。由于在高维系统中对双面图应用量的增加,因此为这一设置设计最佳算法至关重要。Ndaoud等人(2022年)最近的工作改善了特殊情况下现有错误组合率的上限,在特殊情况下,列(重复行)可分为2美元=2美元(重复美元=2美元)。不幸的是,它们的算法不能扩展至高维系统对双面图应用量的增加,因此我们根据权力方法采用新的算法克服了这一限制。我们从总体假设中得出了准确的恢复条件,即:在特殊情况下,列(重复行)可分为2美元=2美元(重复美元=2美元)。不幸的是,它们的算法不能扩展至更一般的设置,即根据权力方法采用新的算法,将N\neq Leq 2美元改为正值,在最低基组(20美元=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx