Transfer learning aims to learn classifiers for a target domain by transferring knowledge from a source domain. However, due to two main issues: feature discrepancy and distribution divergence, transfer learning can be a very difficult problem in practice. In this paper, we present a framework called TLF that builds a classifier for the target domain having only few labeled training records by transferring knowledge from the source domain having many labeled records. While existing methods often focus on one issue and leave the other one for the further work, TLF is capable of handling both issues simultaneously. In TLF, we alleviate feature discrepancy by identifying shared label distributions that act as the pivots to bridge the domains. We handle distribution divergence by simultaneously optimizing the structural risk functional, joint distributions between domains, and the manifold consistency underlying marginal distributions. Moreover, for the manifold consistency we exploit its intrinsic properties by identifying k nearest neighbors of a record, where the value of k is determined automatically in TLF. Furthermore, since negative transfer is not desired, we consider only the source records that are belonging to the source pivots during the knowledge transfer. We evaluate TLF on seven publicly available natural datasets and compare the performance of TLF against the performance of eleven state-of-the-art techniques. We also evaluate the effectiveness of TLF in some challenging situations. Our experimental results, including statistical sign test and Nemenyi test analyses, indicate a clear superiority of the proposed framework over the state-of-the-art techniques.
翻译:然而,由于两个主要问题:差异和分布差异,转移学习可能是一个非常困难的实际问题。在本文件中,我们提出了一个称为TLF的框架,为目标领域建立一个分类器,只有很少的标签培训记录,而目标领域只有很少的分类器,通过从有许多标签记录的来源领域转让知识来建立标签培训记录。虽然现有方法往往侧重于一个问题,而将另一个问题留给进一步工作,但TLF能够同时处理这两个问题。在TLF中,我们通过确定共同标签分布作为连接域的枢纽来缓解特征差异。我们处理分配差异的方法是,同时优化结构风险功能、各域间联合分布以及边缘分布的多重一致性。此外,我们利用其内在特性的多重一致性,确定记录最接近的邻居,在TLF中自动确定 k 的价值。此外,由于不想要进行负面转移,我们只考虑在知识转让期间提出属于源主的明确的源记录。我们用七个公开的自然数据集来评估TRF的分布差异,并比较了我们当前统计工具测试的状态,包括技术的动态测试结果。