以自动编码器为基础的一体化多组综合数据嵌入,以便进行混乱调整 (Autoencoder-based integrative multi-omics data embedding that allows for confounder adjustments)

In the integrative analyses of omics data, it is often of interest to extract data representation from one data type that best reflect its relations with another data type. This task is traditionally fulfilled by linear methods such as canonical correlation analysis (CCA) and partial least squares (PLS). However, information contained in one data type pertaining to the other data type may be complex and in nonlinear form. Deep learning provides a convenient alternative to extract low-dimensional nonlinear data embedding. In addition, the deep learning setup can naturally incorporate the effects of clinical confounding factors into the integrative analysis. Here we report a deep learning setup, named Autoencoder-based Integrative Multi-omics data Embedding (AIME), to extract data representation for omics data integrative analysis. The method can adjust for confounder variables, achieve informative data embedding, rank features in terms of their contributions, and find pairs of features from the two data types that are related to each other through the data embedding. In simulation studies, the method was highly effective in the extraction of major contributing features between data types. Using a real microRNA-gene expression dataset, and a real DNA methylation-gene expression dataset, we show that AIME excluded the influence of confounders including batch effects, and extracted biologically plausible novel information. The R package based on Keras and the TensorFlow backend is available at https://github.com/tianwei-yu/AIME.

翻译：在综合分析流体数据时,往往有兴趣从一种最能反映其与另一种数据类型关系的数据类型中提取数据表示方式。传统上,这项任务是通过直线方法完成的,例如,Canonical 相关分析(CCA)和部分最小方(PLS)。不过,一种数据类型中所含的信息与其他数据类型有关的信息可能复杂,以非线性形式出现。深层学习为提取低维非线性数据嵌入提供了一个方便的替代方法。此外,深层学习设置可以自然地将临床融合因素的影响纳入综合分析中。在这里,我们报告了一个深层学习设置,名为Autencoder Integard 多重组群集数据嵌入式(AIME),以提取数据表示方式进行数据综合分析。该方法可以调整嵌入变量,实现信息嵌入,以其贡献的级别为特征,并从两种数据类型中找到通过数据嵌入后嵌入的组合。在模拟研究中,该方法在数据类型之间提取主要贡献特征时非常有效,名为AUencoder-AINA的内流数据表达方式,在真实的DNA中展示了真实的DNA分析结果,在真实的DNA中,在真实的模型中显示中,我们被排除的DNA数据形式中展示中,在真实的DNA数据形式数据表达式数据表达中,包括了真实的DNA分析中,在真实的DNA数据表达式数据表达式数据。