In cancer research, high-throughput profiling has been extensively conducted. In recent studies, the integrative analysis of data on multiple cancer patient groups/subgroups has been conducted. Such analysis has the potential to reveal the genomic commonality as well as difference across groups/subgroups. However, in the existing literature, methods with a special attention to the genomic commonality and difference are very limited. In this study, a novel estimation and marker selection method based on the sparse boosting technique is developed to address the commonality/difference problem. In terms of technical innovation, a new penalty and computation of increments are introduced. The proposed method can also effectively accommodate the grouping structure of covariates. Simulation shows that it can outperform direct competitors under a wide spectrum of settings. The analysis of two TCGA (The Cancer Genome Atlas) datasets is conducted, showing that the proposed analysis can identify markers with important biological implications and have satisfactory prediction and stability.
翻译:在癌症研究中,广泛进行了高通量剖面分析,在最近的研究中,对多个癌症患者群体/子群体的数据进行了综合分析,这种分析有可能揭示基因组共性以及各群体/子群体之间的差异,但是,在现有文献中,对基因组共性和差异给予特别关注的方法非常有限;在这项研究中,根据稀疏的促动技术制定了新的估计和标记选择方法,以解决共性/差异问题;在技术创新方面,采用了新的惩罚和加量计算方法;拟议的方法也可以有效地适应共变组合结构;模拟表明,在广泛的环境下,它能够超越直接竞争者。对两个TCGA(癌症基因组图集)数据集进行了分析,表明拟议的分析可以确定具有重要生物影响并具有令人满意的预测和稳定性的标志。