In many biomedical research, multiple views of data (e.g., genomics, proteomics) are available, and a particular interest might be the detection of sample subgroups characterized by specific groups of variables. Biclustering methods are well-suited for this problem as they assume that specific groups of variables might be relevant only to specific groups of samples. Many biclustering methods exist for identifying row-column clusters in a view but few methods exist for data from multiple views. The few existing algorithms are heavily dependent on regularization parameters for getting row-column clusters, and they impose unnecessary burden on users thus limiting their use in practice. We extend an existing biclustering method based on sparse singular value decomposition for single-view data to data from multiple views. Our method, integrative sparse singular value decomposition (iSSVD), incorporates stability selection to control Type I error rates, estimates the probability of samples and variables to belong to a bicluster, finds stable biclusters, and results in interpretable row-column associations. Simulations and real data analyses show that iSSVD outperforms several other single- and multi-view biclustering methods and is able to detect meaningful biclusters. iSSVD is a user-friendly, computationally efficient algorithm that will be useful in many disease subtyping applications.
翻译:在许多生物医学研究中,对数据(例如基因组学、蛋白质组学)有多种观点,对数据有特殊兴趣的可能是检测以特定变量组为特点的抽样分组。双组集方法很适合这一问题,因为它们假定特定变量组可能只与特定样本组有关。许多双组组组方法可用于在视图中识别行柱群群,但从多种观点中发现数据的方法很少。现有的算法很少严重依赖获得行柱群群群的正规化参数,给用户带来不必要的负担,从而限制了它们在实践中的使用。我们将基于单一视图数据稀少单组值解析的现有双组集方法推广到多种观点中的数据。我们的方法,即综合稀集单组值解析(iSSVD),结合稳定性选择以控制类型I误率,估计样品和变量属于双组群集的可能性,发现稳定的双组群集,并导致可解释的行柱群集组合。模拟和真实数据分析表明,基于单组数据组数据组数据的稀少单组数据组化单组数据组数据组分解。我们的方法,综合的单组和多组群集法将可检测其他有效的用户群集法方法。