The integration of multimodal data presents a challenge in cases when the study of a given phenomena by different instruments or conditions generates distinct but related domains. Many existing data integration methods assume a known one-to-one correspondence between domains of the entire dataset, which may be unrealistic. Furthermore, existing manifold alignment methods are not suited for cases where the data contains domain-specific regions, i.e., there is not a counterpart for a certain portion of the data in the other domain. We propose Diffusion Transport Alignment (DTA), a semi-supervised manifold alignment method that exploits prior correspondence knowledge between only a few points to align the domains. By building a diffusion process, DTA finds a transportation plan between data measured from two heterogeneous domains with different feature spaces, which by assumption, share a similar geometrical structure coming from the same underlying data generating process. DTA can also compute a partial alignment in a data-driven fashion, resulting in accurate alignments when some data are measured in only one domain. We empirically demonstrate that DTA outperforms other methods in aligning multimodal data in this semisupervised setting. We also empirically show that the alignment obtained by DTA can improve the performance of machine learning tasks, such as domain adaptation, inter-domain feature mapping, and exploratory data analysis, while outperforming competing methods.
翻译:当不同工具或条件对特定现象的研究产生不同但相互关联的领域时,多式联运数据一体化就是一个挑战。许多现有数据一体化方法假定整个数据集各域间已知的一对一对应,这可能不切实际。此外,现有多重对齐方法不适合数据包含特定领域区域的情况,即,在另一个域内某部分数据没有对应方,我们提议采用半监督的多功能对齐方法,即“扩散运输对齐”(DTA),这是一种半监督的多功能对齐方法,它只利用几个点之间先前的通信知识来调整域。通过建立扩散进程,DTA发现从具有不同特点空间的两个不同领域测量的数据之间有一种运输计划,而这两个领域假设是相同的基本数据生成进程具有相似的几何对称结构。DTA还可以以数据驱动的方式对准部分对齐,在只对某一领域进行计量时,导致准确的对齐。我们从经验上证明DTA比其他方法更适合这一半超前几个点之间的域。我们还从实验性地表明,通过不同特点的地形对齐,同时进行这种对准,通过测测测测测测测测测,同时,还可以测测测测测测测数据,可以改进了数据性分析。