In low-resource settings, model transfer can help to overcome a lack of labeled data for many tasks and domains. However, predicting useful transfer sources is a challenging problem, as even the most similar sources might lead to unexpected negative transfer results. Thus, ranking methods based on task and text similarity -- as suggested in prior work -- may not be sufficient to identify promising sources. To tackle this problem, we propose a new approach to automatically determine which and how many sources should be exploited. For this, we study the effects of model transfer on sequence labeling across various domains and tasks and show that our methods based on model similarity and support vector machines are able to predict promising sources, resulting in performance increases of up to 24 F1 points.
翻译:在低资源环境下,模式转让可有助于克服许多任务和领域缺乏标签数据的问题,然而,预测有用的转让来源是一个具有挑战性的问题,因为即使是最相似的来源也可能导致出乎意料的负面转让结果,因此,根据任务和文本相似性的排序方法 -- -- 如先前工作所建议的那样 -- -- 可能不足以确定有希望的来源。为了解决这一问题,我们建议采取新的办法,自动确定应利用哪些和多少来源。为此,我们研究模式转让对不同领域和任务的顺序标识的影响,并表明我们基于模式相似性和辅助矢量机器的方法能够预测有希望的来源,导致性能增加24个F1点。