矩阵完成基本界限:利用等级相似性图 (On the Fundamental Limits of Matrix Completion: Leveraging Hierarchical Similarity Graphs)

from arxiv, The first two authors contributed equally to this work. A preliminary version of this work was presented at the 2020 Advances in Neural Information Processing Systems Conference (NeurIPS 2020)

We study the matrix completion problem that leverages hierarchical similarity graphs as side information in the context of recommender systems. Under a hierarchical stochastic block model that well respects practically-relevant social graphs and a low-rank rating matrix model, we characterize the exact information-theoretic limit on the number of observed matrix entries (i.e., optimal sample complexity) by proving sharp upper and lower bounds on the sample complexity. In the achievability proof, we demonstrate that probability of error of the maximum likelihood estimator vanishes for sufficiently large number of users and items, if all sufficient conditions are satisfied. On the other hand, the converse (impossibility) proof is based on the genie-aided maximum likelihood estimator. Under each necessary condition, we present examples of a genie-aided estimator to prove that the probability of error does not vanish for sufficiently large number of users and items. One important consequence of this result is that exploiting the hierarchical structure of social graphs yields a substantial gain in sample complexity relative to the one that simply identifies different groups without resorting to the relational structure across them. More specifically, we analyze the optimal sample complexity and identify different regimes whose characteristics rely on quality metrics of side information of the hierarchical similarity graph. Finally, we present simulation results to corroborate our theoretical findings and show that the characterized information-theoretic limit can be asymptotically achieved.

翻译：我们研究矩阵完成问题,利用等级相似的图表作为推荐者系统中的侧边信息。在一个非常尊重实际相关的社会图表和低级别评级矩阵模型的分级随机区块模型中,我们通过证明抽样复杂性的高度上下界限不会消失,对观察到的矩阵条目数目(即最佳样本复杂性)进行精确的信息理论限制(即最佳样本复杂性)进行定性。在可实现性证据中,我们证明,如果满足所有充分的条件,最大可能性估计数字的误差概率对于足够多的用户和项目来说会消失的可能性很大。另一方面,对等(可能性)证据以基因辅助的最大可能性估计矩阵矩阵模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型模型为基础。在每种必要条件下,我们用基因辅助的估算模型模型模型模型模型来证明,对于足够多的用户和项目来说,误差的可能性不会消失。这一结果的一个重要后果是,利用社会图表的等级结构可以产生与仅仅确定不同组的样本复杂性相比,而不需要使用最有可能达到最高的可能性。具体地分析我们目前最差的层次模型模型模型模型模型模型模型模型模型模型模型结构的特征,具体地显示了我们目前最差的模型模型结构的特征。