As the availability of omics data has increased in the last few years, more multi-omics data have been generated, that is, high-dimensional molecular data consisting of several types such as genomic, transcriptomic, or proteomic data, all obtained from the same patients. Such data lend themselves to being used as covariates in automatic outcome prediction because each omics type may contribute unique information, possibly improving predictions compared to using only one omics data type. Frequently, however, in the training data and the data to which automatic prediction rules should be applied, the test data, the different omics data types are not available for all patients. We refer to this type of data as block-wise missing multi-omics data. First, we provide a literature review on existing prediction methods applicable to such data. Subsequently, using a collection of 13 publicly available multi-omics data sets, we compare the predictive performances of several of these approaches for different block-wise missingness patterns. Finally, we discuss the results of this empirical comparison study and draw some tentative conclusions.
翻译:过去几年来,由于获得的显微粒数据有所增加,产生了更多的多组数据,即由数种类型组成的高维分子数据,如基因组学、转录组学或蛋白组学数据,全部都是从同一批病人那里获得的。这类数据可以用作自动结果预测中的共变数据,因为每一种显微粒类型都可能提供独特的信息,可能比只使用一种显微粒数据类型改进预测。然而,在培训数据和自动预测规则应适用的数据中,往往没有为所有病人提供测试数据,不同的显微粒数据类型。我们把这类数据称为块状缺失的多组类数据。首先,我们对适用于这类数据的现有预测方法进行文献审查。随后,我们利用13套公开提供的多组数据集,比较了这些方法中若干方法的预测性表现,以不同的成块法缺失模式。最后,我们讨论了这一实验性比较研究的结果,并得出一些初步结论。