利用PPPLasso确定高维数据中的预测和预测生物标志 (Identification of prognostic and predictive biomarkers in high-dimensional data with PPLasso)

In clinical trials, identification of prognostic and predictive biomarkers is essential to precision medicine. Prognostic biomarkers can be useful for the prevention of the occurrence of the disease, and predictive biomarkers can be used to identify patients with potential benefit from the treatment. Previous researches were mainly focused on clinical characteristics, and the use of genomic data in such an area is hardly studied. A new method is required to simultaneously select prognostic and predictive biomarkers in high dimensional genomic data where biomarkers are highly correlated. We propose a novel approach called PPLasso (Prognostic Predictive Lasso) integrating prognostic and predictive effects into one statistical model. PPLasso also takes into account the correlations between biomarkers that can alter the biomarker selection accuracy. Our method consists in transforming the design matrix to remove the correlations between the biomarkers before applying the generalized Lasso. In a comprehensive numerical evaluation, we show that PPLasso outperforms the traditional Lasso approach on both prognostic and predictive biomarker identification in various scenarios. Finally, our method is applied to publicly available transcriptomic data from clinical trial RV144. Our method is implemented in the PPLasso R package available from the Comprehensive R Archive Network (CRAN).

翻译：在临床试验中,确定预测性和预测性生物标志对于精确医学至关重要。预测性生物标志可用于预防疾病的发生,预测性生物标志可用于确定可能从治疗中受益的病人。以前的研究主要侧重于临床特征,而在这类地区使用基因组数据的情况很少研究。在生物标志高度相关的高维基因组数据中,需要采用新方法同时选择预测性和预测性生物标志。我们提议了一种名为PPLasso(预测性预测性预测性激光索)的新颖方法,将预测性和预测性影响整合到一个统计模型中。预测性生物标志可用于识别可能从治疗中受益的病人。 PPLasso还考虑到生物标志者之间的相互关系,可以改变生物标志选择的准确性。我们的方法是改造设计矩阵,在应用通用激光系统之前消除生物标志之间的关联性。在全面的数字评估中,我们显示PPLasso超越了在预测性和预测性生物标志性激光学和预测性激光学激光学方法(预测性激光学激光学激光学激光学激光学激光学激光学预测性激光学激光学激光学激光学预测性激光学激光学激光学),我们从各种假设的临床实验模型中可以公开地用的方法,最后从各种试验中将我们的现有数据解算中采用。我们的方法用于从综合的RCRCRCRCRCRCRMMM 。我们从综合的临床的临床的临床档案数据库数据库数据库数据库数据库数据库中可以采用。我们的数据。