Nowadays, clinical research routinely uses omics data, such as gene expression, for predicting clinical outcomes or selecting markers. Additionally, so-called co-data are often available, providing complementary information on the covariates, like p-values from previously published studies or groups of genes corresponding to pathways. Elastic net penalisation is widely used for prediction and covariate selection. Group-adaptive elastic net penalisation learns from co-data to improve the prediction and covariate selection, by penalising important groups of covariates less than other groups. Existing methods are, however, computationally expensive. Here we present a fast method for marginal likelihood estimation of group-adaptive elastic net penalties for generalised linear models. We first derive a low-dimensional representation of the Taylor approximation of the marginal likelihood and its first derivative for group-adaptive ridge penalties, to efficiently estimate these penalties. Then we show by using asymptotic normality of the linear predictors that the marginal likelihood for elastic net models may be approximated well by the marginal likelihood for ridge models. The ridge group penalties are then transformed to elastic net group penalties by using the variance function. The method allows for overlapping groups and unpenalised variables. We demonstrate the method in a model-based simulation study and an application to cancer genomics. The method substantially decreases computation time and outperforms or matches other methods by learning from co-data.
翻译:目前,临床研究经常使用基因表达方式等类类的粒子数据来预测临床结果或选择标记。此外,经常提供所谓的共同数据,提供关于共变数的补充性信息,例如以前出版的研究或与路径相对应的基因组的p值; 弹性网惩罚广泛用于预测和共变数选择; 群体适应性弹性网惩罚从共同数据中学习如何改进预测和共变选择,方法是惩罚重要的共变组比其他组少。 但是,现有的方法在计算上费用很高。 我们在这里提出了一个快速的方法,用于对一般线型模型的组适应性弹性网惩罚进行边际可能性估算。 我们首先对泰勒的边缘可能性近似值及其第一个衍生物进行低维度代表,以便有效地估计这些惩罚。 然后我们通过使用线性预测器的随机正常度预测器来显示,弹性网模型的边际可能性可能比其他螺旋模型的边际概率要贵。 脊型组的模型的边际数据弹性网格惩罚可以用来对一般线变差的方法进行模拟,然后用不易变的变法方法来显示一种变形的变型方法。