Models that perform out-of-domain generalization borrow knowledge from heterogeneous source data and apply it to a related but distinct target task. Transfer learning has proven effective for accomplishing this generalization in many applications. However, poor selection of a source dataset can lead to poor performance on the target, a phenomenon called negative transfer. In order to take full advantage of available source data, this work studies source data selection with respect to a target task. We propose two source selection methods that are based on the multi-bandit theory and random search, respectively. We conduct a thorough empirical evaluation on both simulated and real data. Our proposals can be also viewed as diagnostics for the existence of a reweighted source subsamples that perform better than the random selection of available samples.
翻译:进行外部一般化的模型从多种来源数据中借用知识,并将其应用于相关但不同的目标任务。转让学习已证明对于在许多应用中实现这种普遍化十分有效。然而,源数据集的选择不力可能导致目标业绩不佳,这种现象被称为负转移。为了充分利用现有源数据,本工作研究为一项目标任务提供数据选择。我们分别根据多带理论和随机搜索提出两种来源选择方法。我们对模拟数据和实际数据进行彻底的经验性评估。我们的建议也可以被视为对重加权源子样本的存在进行的诊断,该样本的功能优于现有抽样的随机选择。