High-dimensional prediction considers data with more variables than samples. Generic research goals are to find the best predictor or to select variables. Results may be improved by exploiting prior information in the form of co-data, providing complementary data not on the samples, but on the variables. We consider adaptive ridge penalised generalised linear and Cox models, in which the variable specific ridge penalties are adapted to the co-data to give a priori more weight to more important variables. The R-package ecpc originally accommodated various and possibly multiple co-data sources, including categorical co-data, i.e. groups of variables, and continuous co-data. Continuous co-data, however, was handled by adaptive discretisation, potentially inefficiently modelling and losing information. Here, we present an extension to the method and software for generic co-data models, particularly for continuous co-data. At the basis lies a classical linear regression model, regressing prior variance weights on the co-data. Co-data variables are then estimated with empirical Bayes moment estimation. After placing the estimation procedure in the classical regression framework, extension to generalised additive and shape constrained co-data models is straightforward. Besides, we show how ridge penalties may be transformed to elastic net penalties with the R-package squeezy. In simulation studies we first compare various co-data models for continuous co-data from the extension to the original method. Secondly, we compare variable selection performance to other variable selection methods. Moreover, we demonstrate use of the package in several examples throughout the paper.
翻译:高层次的预测考虑到比抽样更多的变量。通用研究目标是寻找最佳预测者或选择变量。结果可以通过利用共同数据形式的先前信息,提供补充数据,而不是样本,而提供补充数据,从而改进结果。我们考虑适应性脊柱惩罚的普通线性模型和考克斯模型,根据共同数据调整可变特定脊峰惩罚,使先前对更重要的变量有更大的权重。R包式电子计算机最初包含各种和可能多个共同数据来源,包括绝对的共数据,即变量组和连续的共数据。不过,连续共数据是通过适应性离散处理的,可能低效的建模和丢失信息。这里,我们介绍了通用共数据模型的方法和软件的扩展,特别是连续的共数据。基础是经典的线性回归模型,将先前的差异权重回归到共数据。然后用实证性Bayes时间估算共同数据变量。在将估算程序置于古老的回归框架中,将原始的组合组合组合组合和连续的组合共数据扩展,但连续的共性共同数据模拟中,我们用一系列的比重方法展示了共同的比重的方法。我们先展示了共同的计算方法,然后我们先将原始的比重再展示了原始的比重的比重的比重的计算方法,然后再展示了各种的比重。我们更重的计算方法。我们再展示了整个的计算方法。我们更深级的计算方法。我们更深级的比重的计算方法。我们展示了整个的计算方法,再展示了整个的计算方法。我们展示了整个的计算方法。