Transfer learning that adapts a model trained on data-rich sources to low-resource targets has been widely applied in natural language processing (NLP). However, when training a transfer model over multiple sources, not every source is equally useful for the target. To better transfer a model, it is essential to understand the values of the sources. In this paper, we develop SEAL-Shap, an efficient source valuation framework for quantifying the usefulness of the sources (e.g., domains/languages) in transfer learning based on the Shapley value method. Experiments and comprehensive analyses on both cross-domain and cross-lingual transfers demonstrate that our framework is not only effective in choosing useful transfer sources but also the source values match the intuitive source-target similarity.
翻译:将数据丰富来源培训的转让学习模式应用于低资源目标的转让学习模式已在自然语言处理中得到广泛应用。然而,在培训多种来源的转让模式时,并非所有来源都对目标同样有用。为了更好地转让模式,必须了解来源的价值。在本文件中,我们开发了SEAL-Shap,这是一个高效的来源估值框架,用于量化来源(如域/语文)在基于Shapley 价值方法的转让学习中的用处。关于跨域和跨语文转让的实验和全面分析表明,我们的框架不仅在选择有用的转让来源方面有效,而且源值与直观来源目标相似。