Radiogenomics is an emerging field in cancer research that combines medical imaging data with genomic data to predict patients clinical outcomes. In this paper, we propose a multivariate sparse group lasso joint model to integrate imaging and genomic data for building prediction models. Specifically, we jointly consider two models, one regresses imaging features on genomic features, and the other regresses patients clinical outcomes on genomic features. The regularization penalties through sparse group lasso allow incorporation of intrinsic group information, e.g. biological pathway and imaging category, to select both important intrinsic groups and important features within a group. To integrate information from the two models, in each model, we introduce a weight in the penalty term of each individual genomic feature, where the weight is inversely correlated with the model coefficient of that feature in the other model. This weight allows a feature to have a higher chance of selection by one model if it is selected by the other model. Our model is applicable to both continuous and time to event outcomes. It also allows the use of two separate datasets to fit the two models, addressing a practical challenge that many genomic datasets do not have imaging data available. Simulations and real data analyses demonstrate that our method outperforms existing methods in the literature.
翻译:放射性基因组学是癌症研究的一个新兴领域,它将医学成像数据与基因组数据相结合,以预测病人临床结果。在本文中,我们提出一个多变的稀少群群列(lasso)联合模型,以整合成成像和基因组数据,以建立预测模型。具体地说,我们共同考虑两个模型,一个是基因组特征的回归成像特征,另一个是基因组特征的回归成像特征,另一个是基因组特征的回归病人临床结果。通过稀疏群列(soso)的正规化处罚,可以将内在群体信息(例如生物路径和成像类别)纳入一个组内的重要内在组和重要特征。为了将两种模型的信息整合到每个模型内,我们在每个模型内,在每种基因组特征的罚款术语内引入一个重量,其中重量与另一个模型内该特征的模型的模型系数反相关。如果被另一个模型所选取,那么一个模型就更有可能被选择。我们的模型既适用于连续又适用于事件结果的时间。它还允许使用两个独立的数据集来适应两个模型,在两个模型中,在每种模型中,解决一个实际挑战,而许多现有模型的模型的模型的模型中没有显示的模型中的现有数据方法。