Machine learning models are prone to overfitting their training (source) domains, which is commonly believed to be the reason why they falter in novel target domains. Here we examine the contrasting view that multi-source domain generalization (DG) is first and foremost a problem of mitigating source domain underfitting: models not adequately learning the signal already present in their multi-domain training data. Experiments on a reading comprehension DG benchmark show that as a model learns its source domains better -- using familiar methods such as knowledge distillation (KD) from a bigger model -- its zero-shot out-of-domain utility improves at an even faster pace. Improved source domain learning also demonstrates superior out-of-domain generalization over three popular existing DG approaches that aim to limit overfitting. Our implementation of KD-based domain generalization is available via PrimeQA at: https://ibm.biz/domain-generalization-with-kd.
翻译:机器学习模式容易过分适应其培训(源)领域,这通常被认为是它们之所以在新的目标领域步履维艰的原因。在这里,我们审视了一种截然不同的观点,即多源域通用(DG)首先是一个缓解源域不适应的问题:模型没有充分学习多域培训数据中已经存在的信号。阅读理解DG基准实验显示,作为一个模型,其源域学习得更好 -- -- 使用熟悉的方法,例如从更大的模型中进行知识蒸馏(KD) -- -- 其零射出域域域功能以更快的速度得到改进。改进的源域域学习还表明,在三种流行的现有通用的DG方法中,优于以限制过度适应为目的的超常域通用。我们实施基于KD的域通用功能可以通过PrienQA网站(https://ibm.biz/domain-generation- with-kd)查阅。