We propose a new method for supervised learning with multiple sets of features ("views"). The multiview problem is especially important in biology and medicine, where "-omics" data such as genomics, proteomics and radiomics are measured on a common set of samples. Cooperative learning combines the usual squared error loss of predictions with an "agreement" penalty to encourage the predictions from different data views to agree. By varying the weight of the agreement penalty, we get a continuum of solutions that include the well-known early and late fusion approaches. Cooperative learning chooses the degree of agreement (or fusion) in an adaptive manner, using a validation set or cross-validation to estimate test set prediction error. One version of our fitting procedure is modular, where one can choose different fitting mechanisms (e.g. lasso, random forests, boosting, neural networks) appropriate for different data views. In the setting of cooperative regularized linear regression, the method combines the lasso penalty with the agreement penalty, yielding feature sparsity. The method can be especially powerful when the different data views share some underlying relationship in their signals that can be exploited to boost the signals. We show that cooperative learning achieves higher predictive accuracy on simulated data and a real multiomics example of labor onset prediction. Leveraging aligned signals and allowing flexible fitting mechanisms for different modalities, cooperative learning offers a powerful approach to multiomics data fusion.
翻译:我们提出了一套有多种特征(“视图”)的监督性学习新方法。多观点问题在生物学和医学中特别重要,在生物和医学中,通过一组共同样本测量“组群”数据,如基因组学、蛋白质组学和放射学等“组群”数据。合作学习将通常的平方错误预测损失与“协议”惩罚结合起来,以鼓励不同数据观点的预测达成一致。通过改变协议处罚的权重,我们得到了一系列解决方案,其中包括早期和晚期的聚合方法。合作学习以适应的方式选择了协议(或聚合)的程度,使用一个验证组或交叉校准来估计预测错误。我们一个适合程序的版本是模块化的,可以选择适合不同数据观点的不同机制(如拉索、随机森林、提振、神经网络)的“协议”惩罚。在合作性线性回归的设定中,方法将拉素处罚与协议处罚相结合,产生特征孔化。当不同的数据观点分享某种更强的精确度时,方法会特别有力,因为不同的数据观点分享了某种更精确性的关系,我们学习了一种更精确的模型,我们学习了一种更精确的预估。