In many high-dimensional prediction or classification tasks, complementary data on the features are available, e.g. prior biological knowledge on (epi)genetic markers. Here we consider tasks with numerical prior information that provide an insight into the importance (weight) and the direction (sign) of the feature effects, e.g. regression coefficients from previous studies. We propose an approach for integrating multiple sources of such prior information into penalised regression. If suitable co-data are available, this improves the predictive performance, as shown by simulation and application. The proposed method is implemented in the R package `transreg' (https://github.com/lcsb-bds/transreg).
翻译:在许多高维预测或分类任务中,可以提供关于这些特征的补充数据,例如以前对(epi)基因标记的生物知识。这里我们考虑的是具有先前数字信息的任务,这种信息有助于深入了解特征效应的重要性(重量)和方向(信号),例如以前研究的回归系数。我们建议了一种办法,将这类先前信息的多种来源纳入惩罚性回归。如果有适当的共同数据,这将改进模拟和应用所显示的预测性能。拟议方法在R包件“transreg”中实施(https://github.com/lcsb-bds/transreg)。