In clinical trials, identification of prognostic and predictive biomarkers is essential to precision medicine. Prognostic biomarkers can be useful for the prevention of the occurrence of the disease, and predictive biomarkers can be used to identify patients with potential benefit from the treatment. Previous researches were mainly focused on clinical characteristics, and the use of genomic data in such an area is hardly studied. A new method is required to simultaneously select prognostic and predictive biomarkers in high dimensional genomic data where biomarkers are highly correlated. We propose a novel approach called PPLasso (Prognostic Predictive Lasso) integrating prognostic and predictive effects into one statistical model. PPLasso also takes into account the correlations between biomarkers that can alter the biomarker selection accuracy. Our method consists in transforming the design matrix to remove the correlations between the biomarkers before applying the generalized Lasso. In a comprehensive numerical evaluation, we show that PPLasso outperforms the traditional Lasso approach on both prognostic and predictive biomarker identification in various scenarios. Finally, our method is applied to publicly available transcriptomic data from clinical trial RV144. Our method is implemented in the PPLasso R package which will be soon available from the Comprehensive R Archive Network (CRAN).
翻译:在临床试验中,确定预测性和预测性生物标志对于精确医学至关重要。预测性生物标志可用于预防疾病的发生,预测性生物标志可用于确定可能从治疗中受益的病人。以前的研究主要侧重于临床特征,而在这类地区使用基因组数据的情况很少研究。在生物标志高度相关的高维基因组数据中,需要采用新方法同时选择预测性和预测性生物标志。我们提议了一种名为PPLasso(预测性预测性激光)的新颖方法,将预测性和预测性影响纳入一个统计模型。PPLasso还考虑到生物标志之间的相关性,可以改变生物标志选择的准确性。我们的方法是改造设计矩阵,在应用通用激光标之前消除生物标志之间的相互关系。在全面的数字评估中,我们显示PPLasso超越了在预测性和预测性生物标志性激光学上的传统方法(预测性激光标志激光标志激光标志激光仪Lasso),我们从各种假设中可得到的RCR-144号临床数据网络将很快从各种假设中公开应用,我们的方法将直接应用到可获取的R-CR-CR-R-CR的临床数据。