Manifold learning approaches, such as Stochastic Neighbour Embedding (SNE), Locally Linear Embedding (LLE) and Isometric Feature Mapping (ISOMAP) have been proposed for performing non-linear dimensionality reduction. These methods aim to produce two or three latent embeddings, in order to visualise the data in intelligible representations. This manuscript proposes extensions of Student's t-distributed SNE (t-SNE), LLE and ISOMAP, to allow for dimensionality reduction and subsequent visualisation of multi-view data. Nowadays, it is very common to have multiple data-views on the same samples. Each data-view contains a set of features describing different aspects of the samples. For example, in biomedical studies it is possible to generate multiple OMICS data sets for the same individuals, such as transcriptomics, genomics, epigenomics, enabling better understanding of the relationships between the different biological processes. Through the analysis of real and simulated datasets, the visualisation performance of the proposed methods is illustrated. Data visualisations have been often utilised for identifying any potential clusters in the data sets. We show that by incorporating the low-dimensional embeddings obtained via the multi-view manifold learning approaches into the K-means algorithm, clusters of the samples are accurately identified. Our proposed multi-SNE method outperforms the corresponding multi-ISOMAP and multi-LLE proposed methods. Interestingly, multi-SNE is found to have comparable performance with methods proposed in the literature for performing multi-view clustering.
翻译:为了进行非线性维度的减少,提议了Manidex 学习方法,如Stochastic 邻居嵌入式(SNE)、本地线性嵌入式(LLE)和Isomatic 地貌映射(ISOMAP),以进行非线性维度的减少。这些方法旨在产生两三个潜在的嵌入式。这些方法的目的是为同一人制作两个或三个潜在的嵌入式,以便以可理解的表示方式对数据进行可视化。本稿建议扩展学生的T-分布式 SNE(t-SNE)、LLE和ISOMAP(ISOMAP),以使人们更好地了解拟议的不同生物进程之间的关系。通过对真实和模拟的数据集进行分析,现在非常常见的是在同一样品上多维度上多维度的图像。每个数据视图都包含一组描述样本的不同特征。例如,在生物医学研究中,有可能为同一人制作多个OMICS数据集,如路谱、基因组、缩入式、缩略图,我们所发现的多维值的模型中,我们所选取的解式的解式的解式方法往往通过多维化方法来显示。