In many practical data mining scenarios, such as network intrusion detection, Twitter spam detection, and computer-aided diagnosis, a source domain that is different from but related to a target domain is very common. In addition, a large amount of unlabeled data is available in both source and target domains, but labeling each of them is difficult, expensive, time-consuming, and sometime unnecessary. Therefore, it is very important and worthwhile to fully explore the labeled and unlabeled data in source and target domains to settle the task in target domain. In this paper, a new semi-supervised inductive transfer learning framework, named \emph{Co-Transfer} is proposed. Co-Transfer first generates three TrAdaBoost classifiers for transfer learning from the source domain to the target domain, and meanwhile another three TrAdaBoost classifiers are generated for transfer learning from the target domain to the source domain, using bootstraped samples from the original labeled data. In each round of co-transfer, each group of TrAdaBoost classifiers are refined using the carefully labeled data. Finally, the group of TrAdaBoost classifiers learned to transfer from the source domain to the target domain produce the final hypothesis. Experiments results illustrate Co-Transfer can effectively exploit and reuse the labeled and unlabeled data in source and target domains.
翻译:在许多实用的数据开采情景中,例如网络入侵探测、Twitter垃圾邮件探测和计算机辅助诊断等,一个与目标域不同但与目标域相关的源域非常常见。此外,在源域和目标域中,还有大量未贴标签的数据,但每个域的标签都困难、昂贵、耗时,有时是不必要的。因此,在源域和目标域中充分探索标签和未贴标签的数据,以在目标域中解决问题非常重要和值得。在本文中,提出了一个新的半监督的启动性传输学习框架,名为\emph{Co-Transfer}。共同转移者首先生成了三个TRAdaBoost分类器,以便从源域向目标域转移学习,同时生成了另外三个TRAdaBoost分类器,用于将目标域的学习从目标域转移到源域,使用原始标签数据捆绑的样本在目标域中解决问题。在每一轮共同转让中,每个TRABoost分类组都使用精心贴标签的域数据进行精细的升级。最后的域域域域域域,可以将Trada-tradevelillal A Greal的域数据转换成。