In this paper, we propose a probabilistic model with automatic relevance determination (ARD) for learning interpolative decomposition (ID), which is commonly used for low-rank approximation, feature selection, and identifying hidden patterns in data, where the matrix factors are latent variables associated with each data dimension. Prior densities with support on the specified subspace are used to address the constraint for the magnitude of the factored component of the observed matrix. Bayesian inference procedure based on Gibbs sampling is employed. We evaluate the model on a variety of real-world datasets including CCLE $EC50$, CCLE $IC50$, Gene Body Methylation, and Promoter Methylation datasets with different sizes, and dimensions, and show that the proposed Bayesian ID algorithms with automatic relevance determination lead to smaller reconstructive errors even compared to vanilla Bayesian ID algorithms with fixed latent dimension set to matrix rank.
翻译:在本文中,我们提出了一个具有自动相关性确定(ARD)的概率模型,用于学习中间分解(ID),这种模型通常用于低级近似、特征选择和识别数据中的隐藏模式,其中矩阵因素是与每个数据维度相关的潜在变量。在特定子空间上支持的先前密度用于解决所观测的矩阵中因子空间的大小限制。采用了基于Gibbs抽样的巴耶斯推断程序。我们评估了各种真实世界数据集的模型,其中包括:CLE $EC50$、CCLE $50$、基因体甲基化和不同尺寸的促进甲基化数据集,并表明拟议的具有自动相关性确定法的巴耶斯身份算法导致更小的重建错误,即使与Vanilla Bayesian 身份算法相比,该算法具有固定的潜在维度,并设定为矩阵级。