Current Instance Transfer Learning (ITL) methodologies use domain adaptation and sub-space transformation to achieve successful transfer learning. However, these methodologies, in their processes, sometimes overfit on the target dataset or suffer from negative transfer if the test dataset has a high variance. Boosting methodologies have been shown to reduce the risk of overfitting by iteratively re-weighing instances with high-residual. However, this balance is usually achieved with parameter optimization, as well as reducing the skewness in weights produced due to the size of the source dataset. While the former can be achieved, the latter is more challenging and can lead to negative transfer. We introduce a simpler and more robust fix to this problem by building upon the popular boosting ITL regression methodology, two-stage TrAdaBoost.R2. Our methodology,~\us{}, is a boosting and random-forest based ensemble methodology that utilizes importance sampling to reduce the skewness due to the source dataset. We show that~\us{}~performs better than competitive transfer learning methodologies $63\%$ of the time. It also displays consistency in its performance over diverse datasets with varying complexities, as opposed to the sporadic results observed for other transfer learning methodologies.
翻译:然而,这些方法在其过程中,有时超过目标数据集,如果测试数据集差异很大,有时会遭受负面传输。已经证明,推进方法可以降低因高复发性高的迭接重重置情况而过度适应的风险。然而,这种平衡通常是通过参数优化实现的,以及减少因源数据集规模而产生的重量偏差。虽然前者可以实现,但后者更具有挑战性,可能导致负转移。我们通过在流行的推进性国际交易日志回归方法TrAdaBoost两阶段的基础上,对此问题采用更简单、更强有力的修正。R.2我们的方法“ ⁇ us ⁇ ”是一种促进和随机森林混合方法,它利用重要取样减少源数据集的偏差。我们表明,“us ⁇ per”比竞争性转移学习方法更好,63美元对时间来说,它可能导致负转移。我们还采用更简单、更稳健的方法,利用流行的双阶段TrAdaBoost。R2.我们的方法“ ⁇ ”是一种促进和随机森林混合方法,它利用重要取样来减少源数据集的偏差。我们表明,它比竞争性转移方法要好得多,从时间的63美元改为零星性地学习其他方法。它所观察到的复杂。