Rapid technological advances have allowed for molecular profiling across multiple omics domains from a single sample for clinical decision making in many diseases, especially cancer. As tumor development and progression are dynamic biological processes involving composite genomic aberrations, key challenges are to effectively assimilate information from these domains to identify genomic signatures and biological entities that are druggable, develop accurate risk prediction profiles for future patients, and identify novel patient subgroups for tailored therapy and monitoring. We propose integrative probabilistic frameworks for high-dimensional multiple-domain cancer data that coherently incorporate dependence within and between domains to accurately detect tumor subtypes, thus providing a catalogue of genomic aberrations associated with cancer taxonomy. We propose an innovative, flexible and scalable Bayesian nonparametric framework for simultaneous clustering of both tumor samples and genomic probes. We describe an efficient variable selection procedure to identify relevant genomic aberrations that can potentially reveal underlying drivers of a disease. Although the work is motivated by several investigations related to lung cancer, the proposed methods are broadly applicable in a variety of contexts involving high-dimensional data. The success of the methodology is demonstrated using artificial data and lung cancer omics profiles publicly available from The Cancer Genome Atlas.
翻译:由于肿瘤的发育和进化是涉及综合基因组畸变的动态生物过程,因此,关键的挑战是如何有效地吸收这些领域的信息,以查明可药的基因组特征和生物实体,为未来的病人制定准确的风险预测图,并查明新的病人分组,以便进行有针对性的治疗和监测。我们提议高维多界癌症数据的综合概率框架,这些数据在多个领域内部和之间一致纳入依赖性,以便准确检测肿瘤亚型,从而提供与癌症分类有关的基因组畸变目录。我们提议一个创新、灵活和可缩放的巴伊西亚非参数框架,用于同时组合肿瘤样品和基因组探测器。我们描述一个高效的变量选择程序,以确定可能揭示疾病基本动因的相关基因组畸变。虽然这项工作受到几项与肺癌有关的调查的激励,但拟议的方法在涉及高度数据的各种环境中广泛适用。我们用人造数据和癌症剖析图展示了该方法的成功。