Multi-domain data is becoming increasingly common and presents both challenges and opportunities in the data science community. The integration of distinct data-views can be used for exploratory data analysis, and benefit downstream analysis including machine learning related tasks. With this in mind, we present a novel manifold alignment method called MALI (Manifold alignment with label information) that learns a correspondence between two distinct domains. MALI can be considered as belonging to a middle ground between the more commonly addressed semi-supervised manifold alignment problem with some known correspondences between the two domains, and the purely unsupervised case, where no known correspondences are provided. To do this, MALI learns the manifold structure in both domains via a diffusion process and then leverages discrete class labels to guide the alignment. By aligning two distinct domains, MALI recovers a pairing and a common representation that reveals related samples in both domains. Additionally, MALI can be used for the transfer learning problem known as domain adaptation. We show that MALI outperforms the current state-of-the-art manifold alignment methods across multiple datasets.
翻译:多域数据正在变得越来越常见,在数据科学界也呈现了挑战和机遇。不同数据视图的整合可用于探索性数据分析,并有利于下游分析,包括机器学习相关任务。考虑到这一点,我们提出了一种叫MALI(与标签信息对齐)的新颖的多重对齐方法,该方法可以学习两个不同域之间的对应。MALI可以被视为属于较常见的处理半监督的半多重对齐问题,与两个域间一些已知的对接问题和纯粹未提供已知通信的纯无监督案例之间的中间地带。为此,MALI通过扩散过程学习两个域的多重结构,然后利用离散类标签指导对齐。通过对齐两个不同的域,MALI回收了对齐和共同代表,揭示了两个域内的相关样本。此外,MLI还可以用于被称为域适应的传输学习问题。我们显示,MALI在多个数据集中超过了当前最先进的多重对齐方法。