Reproducible translation of transcriptomics data has been hampered by the ubiquitous presence of batch effects. Statistical methods for managing batch effects were initially developed in the setting of sample group comparison and later borrowed for other settings such as survival outcome prediction. The most notable such method is ComBat, which adjusts for batches by including it as a covariate alongside sample groups in a linear regression. In survival prediction, however, ComBat is used without definable groups for survival outcome and is done sequentially with survival regression for a potentially confounded outcome. To address these issues, we propose a new method, called BatMan ("BATch MitigAtion via stratificatioN"). It adjusts batches as strata in survival regression and utilize variable selection methods such as LASSO to handle high dimensionality. We assess the performance of BatMan in comparison with ComBat, each used either alone or in conjunction with data normalization, in a re-sampling-based simulation study under various levels of predictive signal strength and patterns of batch-outcome association. Our simulations show that (1) BatMan outperforms ComBat in nearly all scenarios when there are batch effects in the data, and (2) their performance can be worsened by the addition of data normalization. We further evaluate them using microRNA data for ovarian cancer from the Cancer Genome Atlas, and find that BatMan outforms ComBat while the addition of data normalization worsens the prediction. Our study thus shows the advantage of BatMan and raises caution about the naive use of data normalization in the context of developing survival prediction models. The BatMan method and the simulation tool for performance assessment are implemented in R and publicly available at https://github.com/LXQin/PRECISION.survival.
翻译:运算组数据的正常化翻译因批量效应的正常化而受阻。管理批量效应的统计方法最初是在样本组比较时开发的,后来又为生存结果预测等其他设置而借用。最显著的方法是ComBat,该方法将批量调整,将批量与样本组一起纳入线性回归中的共变。然而,在生存预测中,ComBat 使用没有可定义的组群来求生存结果,而是与生存回归相继进行生存回归。为了解决这些问题,我们提出了一个名为BatMan(BatMan)的新的方法(“BATch MitigAtion通过stratificatioN” ) 。该方法将批量调整为生存回归中的层,并使用LASSO等变量选择方法处理高维度回归。我们评估了巴特曼的性能与ComBatt的对比,每件都单独或与数据正常化一起使用,在预测性变现的模型中,根据预测性能信号强度和分批输出环境环境进行模拟研究。我们所作的模拟模型显示(1) BatMan 和BatBatBat Bat 数据在数据中,其性数据在数据中可以进一步进行数据分析中进行数据变变变现。在数据中进行数据变现数据分析,在数据中进行数据变现数据分析时,在数据中进行数据变现数据分析。