We propose a new method for supervised learning with multiple sets of features ("views"). The multi-view problem is especially important in biology and medicine, where "-omics" data such as genomics, proteomics and radiomics are measured on a common set of samples. Cooperative learning combines the usual squared error loss of predictions with an "agreement" penalty to encourage the predictions from different data views to agree. By varying the weight of the agreement penalty, we get a continuum of solutions that include the well-known early and late fusion approaches. Cooperative learning chooses the degree of agreement (or fusion) in an adaptive manner, using a validation set or cross-validation to estimate test set prediction error. One version of our fitting procedure is modular, where one can choose different fitting mechanisms (e.g. lasso, random forests, boosting, neural networks) appropriate for different data views. In the setting of cooperative regularized linear regression, the method combines the lasso penalty with the agreement penalty, yielding feature sparsity. The method can be especially powerful when the different data views share some underlying relationship in their signals that can be exploited to boost the signals. We show that cooperative learning achieves higher predictive accuracy on simulated data and real multiomics examples of labor onset prediction and breast ductal carcinoma in situ and invasive breast cancer classification. Leveraging aligned signals and allowing flexible fitting mechanisms for different modalities, cooperative learning offers a powerful approach to multiomics data fusion.
翻译:我们提出了一套具有多种特征(“视图”)的监督性学习新方法。多观点问题在生物学和医学中特别重要,在生物和医学中,“工程”数据,如基因组、蛋白质组和放射学,在一组共同样本中测量“工程学”数据。合作学习结合了通常的平方错误的预测损失和“协议”惩罚,以鼓励不同数据观点的预测,从而达成一致。通过改变协议处罚的权重,我们得到了一系列的解决方案,其中包括早期和晚期的聚合方法。合作学习以适应的方式选择了协议(或聚合)的程度,使用一个验证集或交叉校准来估计预测错误。我们一个适合程序的版本是模块化的,可以选择适合不同数据观点的不同机制(如拉索、随机森林、提振、神经网络)的“协议”惩罚。在确定合作性定期线性回归的设置中,方法将拉素处罚与协议的处罚结合起来,产生特征孔化。当不同的数据观点显示某种更精确性时,方法特别强大,当不同的数据观点分享了某种更准确性关系,我们利用了一种更精确的模拟的预测,在模拟中,从而学习了某种更精确的货币的模型的指数,我们学习了它们能,从而可以了解了某种更精确的标志,从而在模拟的货币学中,从而显示了某种精确的信号,从而可以进行着测测测测测测测测测测。