We consider a matrix completion problem that exploits social or item similarity graphs as side information. We develop a universal, parameter-free, and computationally efficient algorithm that starts with hierarchical graph clustering and then iteratively refines estimates both on graph clustering and matrix ratings. Under a hierarchical stochastic block model that well respects practically-relevant social graphs and a low-rank rating matrix model (to be detailed), we demonstrate that our algorithm achieves the information-theoretic limit on the number of observed matrix entries (i.e., optimal sample complexity) that is derived by maximum likelihood estimation together with a lower-bound impossibility result. One consequence of this result is that exploiting the hierarchical structure of social graphs yields a substantial gain in sample complexity relative to the one that simply identifies different groups without resorting to the relational structure across them. We conduct extensive experiments both on synthetic and real-world datasets to corroborate our theoretical results as well as to demonstrate significant performance improvements over other matrix completion algorithms that leverage graph side information.
翻译:我们考虑的是将社会或项目相似的图表作为侧面信息的矩阵完成问题。 我们开发了一个通用的、无参数的和计算效率高的算法,首先从高层次的图形群集开始,然后迭接地完善对图形群集和矩阵评级的估算。 在一个非常尊重实际相关的社会图表和低等级评级矩阵模型的等级分块模型(有待详细分析)下,我们证明我们的算法在所观察到的矩阵条目(即最佳样本复杂性)的数量上达到了信息理论极限,这是通过最大可能性的估算和较低范围不可能的结果得出的。这一结果的一个后果是,利用社会图表的等级结构使抽样复杂性比简单地确定不同组群而不必使用它们之间的关联结构的样本复杂得多。我们在合成和真实世界数据集上进行了广泛的实验,以证实我们的理论结果,并展示相对于其他矩阵完成算法的显著性改进,这些算出图形侧信息。