In various practical situations, matrix factorization methods suffer from poor data quality, such as high data sparsity and low signal-to-noise ratio (SNR). Here we consider a matrix factorization problem by utilizing auxiliary information, which is massively available in real applications, to overcome the challenges caused by poor data quality. Unlike existing methods that mainly rely on simple linear models to combine auxiliary information with the main data matrix, we propose to integrate gradient boosted trees in the probabilistic matrix factorization framework to effectively leverage auxiliary information (MFAI). Thus, MFAI naturally inherits several salient features of gradient boosted trees, such as the capability of flexibly modeling nonlinear relationships, and robustness to irrelevant features and missing values in auxiliary information. The parameters in MAFI can be automatically determined under the empirical Bayes framework, making it adaptive to the utilization of auxiliary information and immune to overfitting. Moreover, MFAI is computationally efficient and scalable to large-scale datasets by exploiting variational inference. We demonstrate the advantages of MFAI through comprehensive numerical results from simulation studies and real data analysis. Our approach is implemented in the R package mfair available at https://github.com/YangLabHKUST/mfair.
翻译:在各种实际情况下,矩阵要素化方法存在数据质量差的问题,如数据宽度高和信号到噪音比率低等。在这里,我们考虑矩阵要素化问题,办法是利用在实际应用中大量存在的辅助信息来克服数据质量差造成的挑战。与主要依靠简单线性模型将辅助信息与主要数据矩阵相结合的现有方法不同,我们提议将梯度增殖树纳入概率矩阵要素化框架,以有效地利用辅助信息(MFAI)。因此,MFAI自然继承了梯度增生树的若干显著特征,例如灵活建模非线性关系的能力,以及辅助信息中无关特征和缺失值的稳健性。根据经验性海湾框架,可以自动确定MAFI的参数,使之适应辅助信息利用辅助信息并免于过度匹配。此外,MFAI通过利用变异性推论,计算出高效且可扩缩到大规模数据集。我们通过模拟研究和真实数据分析的综合数字结果,展示了MFAI的优势。我们的方法是在RUBK/Affiralalal 的MAMP/MAMS 。</s>