High-dimensional multimodal data arises in many scientific fields. The integration of multimodal data becomes challenging when there is no known correspondence between the samples and the features of different datasets. To tackle this challenge, we introduce AVIDA, a framework for simultaneously performing data alignment and dimension reduction. In the numerical experiments, Gromov-Wasserstein optimal transport and t-distributed stochastic neighbor embedding are used as the alignment and dimension reduction modules respectively. We show that AVIDA correctly aligns high-dimensional datasets without common features with four synthesized datasets and two real multimodal single-cell datasets. Compared to several existing methods, we demonstrate that AVIDA better preserves structures of individual datasets, especially distinct local structures in the joint low-dimensional visualization, while achieving comparable alignment performance. Such a property is important in multimodal single-cell data analysis as some biological processes are uniquely captured by one of the datasets. In general applications, other methods can be used for the alignment and dimension reduction modules.
翻译:高维多模态数据在科学领域中经常出现。当不同数据集之间没有已知的样本和特征对应关系时,多模态数据的整合变得具有挑战性。为了解决这个问题,我们引入了 AVIDA 框架,用于同时进行数据对齐和降维。在数值实验中,采用 Gromov-Wasserstein 最优传输和 t-distributed stochastic neighbor embedding 作为对齐和降维模块。我们证明 AVIDA 能够正确地对齐没有共同特征的高维数据集,包括四个合成数据集和两个真实的多模态单细胞数据集。与几种现有方法相比,我们展示了 AVIDA 更好地保留了每个数据集的结构,特别是联合低维可视化中的独特局部结构,同时实现了可比的对齐性能。这种性质在多模态单细胞数据分析中非常重要,因为某些生物过程只能由其中一个数据集独特捕捉。在一般应用中,可以使用其他方法作为对齐和降维模块。