In this paper, we introduce a probabilistic model for learning interpolative decomposition (ID), which is commonly used for feature selection, low-rank approximation, and identifying hidden patterns in data, where the matrix factors are latent variables associated with each data dimension. Prior densities with support on the specified subspace are used to address the constraint for the magnitude of the factored component of the observed matrix. Bayesian inference procedure based on Gibbs sampling is employed. We evaluate the model on a variety of real-world datasets including CCLE EC50, CCLE IC50, CTRP EC50,and MovieLens 100K datasets with different sizes, and dimensions, and show that the proposed Bayesian ID GBT and GBTN models lead to smaller reconstructive errors compared to existing randomized approaches.
翻译:在本文中,我们采用了一种概率模型,用于学习集成分解(ID),通常用于特征选择、低排序近似和识别数据中的隐藏模式,其中矩阵因素是与每个数据维度相关的潜在变量;在特定子空间上,使用先前的密度,并在特定次空间上提供支持,以解决观测到的基质成份规模的制约因素;采用了基于Gibbs取样的贝氏推论程序;我们评估了各种真实世界数据集的模型,包括CCLE EC50、CCLE IC50、CTRP EC50、和具有不同大小和维度的MoveeLens 100K数据集,并表明拟议的Bayesian ID GBT和GBTN模型导致与现有随机方法相比更小的重建错误。