Transfer learning from large language models (LLMs) has emerged as a powerful technique to enable knowledge-based fine-tuning for a number of tasks, adaptation of models for different domains and even languages. However, it remains an open question, if and when transfer learning will work, i.e. leading to positive or negative transfer. In this paper, we analyze the knowledge transfer across three natural language processing (NLP) tasks - text classification, sentimental analysis, and sentence similarity, using three LLMs - BERT, RoBERTa, and XLNet - and analyzing their performance, by fine-tuning on target datasets for domain and cross-lingual adaptation tasks, with and without an intermediate task training on a larger dataset. Our experiments showed that fine-tuning without an intermediate task training can lead to a better performance for most tasks, while more generalized tasks might necessitate a preceding intermediate task training step. We hope that this work will act as a guide on transfer learning to NLP practitioners.
翻译:从大型语言模型(LLMs)中学习知识的转移已成为一种强有力的技术,有助于对若干任务进行基于知识的微调,对不同领域甚至语言的模式进行调整,然而,这仍然是一个未决问题,如果和何时转移学习将奏效,即导致正或负转移。在本文件中,我们分析了三种自然语言处理任务的知识转移情况,即文字分类、感性分析和类似判决,使用三种LMs-BERT、RoBERTA和XLNet-分析其绩效,对域和跨语言适应任务的目标数据集进行微调,同时进行和不进行关于更大数据集的中期任务培训。我们的实验表明,在没有中期任务培训的情况下进行微调可以使大多数任务取得更好的业绩,而更为普遍的任务可能需要在中期培训之前有一个步骤。我们希望,这项工作将成为向NLP从业人员传授学习知识的指南。