High-dimensional multimodal data arises in many scientific fields. The integration of multimodal data becomes challenging when there is no known correspondence between the samples and the features of different datasets. To tackle this challenge, we introduce AVIDA, a framework for simultaneously performing data alignment and dimension reduction. In the numerical experiments, Gromov-Wasserstein optimal transport and t-distributed stochastic neighbor embedding are used as the alignment and dimension reduction modules respectively. We show that AVIDA correctly aligns high-dimensional datasets without common features with four synthesized datasets and two real multimodal single-cell datasets. Compared to several existing methods, we demonstrate that AVIDA better preserves structures of individual datasets, especially distinct local structures in the joint low-dimensional visualization, while achieving comparable alignment performance. Such a property is important in multimodal single-cell data analysis as some biological processes are uniquely captured by one of the datasets. In general applications, other methods can be used for the alignment and dimension reduction modules.
翻译:在许多科学领域出现了高维多式联运数据。当样品和不同数据集特征之间没有已知的对应之处时,多式联运数据的整合就变得具有挑战性。为了应对这一挑战,我们引入了AVIDA,这是一个同时进行数据对齐和减少尺寸的框架。在数字实验中,Gromov-Wasserstein最佳运输和T分布式相邻嵌入分别用作对齐和减少尺寸模块。我们表明,AVIDA将没有共同特征的高维数据集与四个综合数据集和两个真正的多模式单细胞数据集相匹配是正确的。与现有的几种方法相比,我们表明,AVIDA更好地保存了单个数据集的结构,特别是低维联合可视化中独特的本地结构,同时实现了可比的对齐性性。在多式单细胞数据分析中,这种属性很重要,因为某些生物过程被一个数据集独有的捕获。在一般应用中,其他方法可用于对齐和减少尺寸模块。