We propose orthogonal inductive matrix completion (OMIC), an interpretable approach to inductive matrix completion based on a sum of multiple orthonormal side information terms, together with nuclear-norm regularization. The approach allows us to inject prior knowledge about the eigenvectors of the ground truth matrix. We optimize the approach by a provably converging algorithm, which optimizes all components of the model simultaneously. Our method enjoys distribution-free learning guarantees that improve with the quality of the injected knowledge. As a special case of our general framework, we study a model consisting of a sum of user and item biases (generic behavior), a non-inductive term (specific behavior), and (optionally) an inductive term using side information. Our theoretical analysis shows that $\epsilon$-recovering a ground truth matrix in $\mathbb{R}^{m\times n}$ requires at most $O\left( \frac{n+m+(\sqrt{n}+\sqrt{m}) \sqrt{mnr}C}{\epsilon^2}\right)$ entries, where $r$ (resp. $C$) is the rank (resp. maximum entry) of the bias-free part of the ground truth matrix. We analyse the performance of OMIC on several synthetic and real datasets. On synthetic datasets with a sliding scale of user bias relevance, we show that OMIC better adapts to different regimes than other methods. On real-life datasets containing user/items recommendations and relevant side information, we find that OMIC surpasses the state-of-the-art, with the added benefit of greater interpretability.
翻译:我们建议使用一种可以理解的导导矩阵完成( OMIC ), 这是一种可以解释的导导矩阵完成方法, 其基础是多重正态侧端信息术语的总和, 以及核正度的正规化。 这种方法允许我们输入关于地面真相矩阵的导体的先前知识。 我们的理论分析显示, $\ epsilon$ 将地面真相矩阵覆盖到 $mathb{ R\\\\ m\time} 优化模型的所有组成部分。 我们的方法拥有使用知识质量提高的免分配学习保证。 作为我们总体框架的一个特殊案例, 我们研究一个由用户和项目偏差( generic) 侧端信息总和( generical 行为 ), 一个非递增缩缩缩缩缩缩缩术语( 选项) 和( crentrealalalalalalalalal- messional) 数据分析( $Setrial- developalal $; we setal- develop exalalalal setal $) exal- dalal- dalsal- sal sal sal sal sets greal sal set.