Most few-shot learning techniques are pre-trained on a large, labeled "base dataset". In problem domains where such large labeled datasets are not available for pre-training (e.g., X-ray, satellite images), one must resort to pre-training in a different "source" problem domain (e.g., ImageNet), which can be very different from the desired target task. Traditional few-shot and transfer learning techniques fail in the presence of such extreme differences between the source and target tasks. In this paper, we present a simple and effective solution to tackle this extreme domain gap: self-training a source domain representation on unlabeled data from the target domain. We show that this improves one-shot performance on the target domain by 2.9 points on average on the challenging BSCD-FSL benchmark consisting of datasets from multiple domains. Our code is available at https://github.com/cpphoo/STARTUP.
翻译:多数微小的学习技术都是在有标签的大型“数据库数据集”上预先训练的。 在没有如此庞大的标签数据集用于预培训的难题领域(例如X光、卫星图像),我们必须在不同的“源”问题领域(例如图像网络)进行预培训,而不同的“源”问题领域(例如图像网络)可能与预期的目标任务大不相同。传统的少发和传输学习技术在源和目标任务存在如此极端差异的情况下失败。在本文中,我们提出了一个简单而有效的解决方案来解决这一极端领域差距:在目标域的无标签数据上自我培训一个源域代表。我们表明,这在具有挑战性的BSCD-FSL基准(由多个域的数据集构成)方面,平均提高了目标域的2.9个百分点。我们的代码可在https://github.com/cpphoo/STARTUP上查阅。