Many real-world systems are described not only by data from a single source but via multiple data views. In genomic medicine, for instance, patients can be characterized by data from different molecular layers. Latent variable models with structured sparsity are a commonly used tool for disentangling variation within and across data views. However, their interpretability is cumbersome since it requires a direct inspection and interpretation of each factor from domain experts. Here, we propose MuVI, a novel multi-view latent variable model based on a modified horseshoe prior for modeling structured sparsity. This facilitates the incorporation of limited and noisy domain knowledge, thereby allowing for an analysis of multi-view data in an inherently explainable manner. We demonstrate that our model (i) outperforms state-of-the-art approaches for modeling structured sparsity in terms of the reconstruction error and the precision/recall, (ii) robustly integrates noisy domain expertise in the form of feature sets, (iii) promotes the identifiability of factors and (iv) infers interpretable and biologically meaningful axes of variation in a real-world multi-view dataset of cancer patients.
翻译:许多现实世界系统不仅通过单一来源的数据,而且通过多重数据视图来描述。例如,在基因医学中,病人可以以不同分子层的数据为特征。结构宽度的原始变数模型是分散数据视图内部和不同数据视图之间差异的常用工具。然而,这些变数模型的可解释性是累赘的,因为它需要由域专家对每个要素进行直接检查和解释。在这里,我们提出MuVI,这是一个新的多视角潜在变数模型,它基于在建模结构宽度之前经过修改的马蹄,它有利于吸收有限和吵闹的域知识,从而能够以内在可以解释的方式分析多视角数据。我们证明,我们的模型(一)在重建错误和精确/回调方面,超越了模型结构宽度的最先进的方法,(二)以地貌组合的形式有力地整合了吵闹的域专门知识,(三)促进各种因素的可识别性,以及(四)在现实世界的多视角癌症中推断出可解释和具有生物意义的变轴。</s>