In the genomic analysis, it is significant while challenging to identify markers associated with cancer outcomes or phenotypes. Based on the biological mechanisms of cancers and the characteristics of datasets as well, this paper proposes a novel integrative interaction approach under the semiparametric model, in which the genetic factors and environmental factors are included as the parametric and nonparametric components, respectively. The goal of this approach is to identify the genetic factors and gene-gene interactions associated with cancer outcomes, and meanwhile, estimate the nonlinear effects of environmental factors. The proposed approach is based on the threshold gradient directed regularization (TGDR) technique. Simulation studies indicate that the proposed approach outperforms in the identification of main effects and interactions, and has favorable estimation and prediction accuracy compared with the alternative methods. The analysis of non-small-cell lung carcinomas (NSCLC) datasets from The Cancer Genome Atlas (TCGA) are conducted, showing that the proposed approach can identify markers with important implications and have favorable performance in prediction accuracy, identification stability, and computation cost.
翻译:在基因组分析中,确定与癌症结果或苯型有关的标记固然重要,但也具有挑战性;根据癌症的生物机制和数据集的特点,本文件提议在半参数模型下采用新的综合互动办法,其中将遗传因素和环境因素分别列为参数和不参数组成部分;这一办法的目标是查明与癌症结果有关的遗传因素和基因基因-基因相互作用,同时估计环境因素的非线性影响;拟议办法以阈值梯度定向正规化技术为基础;模拟研究表明,拟议办法在确定主要影响和相互作用方面优异,与替代方法相比,具有有利的估计和预测准确性;对癌症基因组图(TCGA)的非小型肺癌瘤(NSCLC)数据集进行了分析,表明拟议办法可以确定具有重要影响的标记,在预测准确性、识别稳定性和计算成本方面表现优异。