High-dimensional omics data contains intrinsic biomedical information that is crucial for personalised medicine. Nevertheless, it is challenging to capture them from the genome-wide data due to the large number of molecular features and small number of available samples, which is also called 'the curse of dimensionality' in machine learning. To tackle this problem and pave the way for machine learning aided precision medicine, we proposed a unified multi-task deep learning framework named OmiEmbed to capture biomedical information from high-dimensional omics data with the deep embedding and downstream task modules. The deep embedding module learnt an omics embedding that mapped multiple omics data types into a latent space with lower dimensionality. Based on the new representation of multi-omics data, different downstream task modules were trained simultaneously and efficiently with the multi-task strategy to predict the comprehensive phenotype profile of each sample. OmiEmbed support multiple tasks for omics data including dimensionality reduction, tumour type classification, multi-omics integration, demographic and clinical feature reconstruction, and survival prediction. The framework outperformed other methods on all three types of downstream tasks and achieved better performance with the multi-task strategy comparing to training them individually. OmiEmbed is a powerful and unified framework that can be widely adapted to various application of high-dimensional omics data and has a great potential to facilitate more accurate and personalised clinical decision making.
翻译:然而,深嵌入模块从全基因组数据中学会了一种嵌入多角度数据模型的模拟模型,该模型将多角度数据类型映射到一个具有较低维度的潜在空间中。基于多类数据的新表述,不同的下游任务模块与多任务战略同时并进地接受了培训,以预测每个样本的综合性人型特征。 OmiEmbed支持了多种任务,包括降低尺寸、肿瘤类型分类、多类组合组合、人口和临床特征重建以及生存预测。 框架超越了其他三种类型的个人下游和大型决策框架。