Recently, leveraging pre-trained Transformer based language models in down stream, task specific models has advanced state of the art results in natural language understanding tasks. However, only a little research has explored the suitability of this approach in low resource settings with less than 1,000 training data points. In this work, we explore fine-tuning methods of BERT -- a pre-trained Transformer based language model -- by utilizing pool-based active learning to speed up training while keeping the cost of labeling new data constant. Our experimental results on the GLUE data set show an advantage in model performance by maximizing the approximate knowledge gain of the model when querying from the pool of unlabeled data. Finally, we demonstrate and analyze the benefits of freezing layers of the language model during fine-tuning to reduce the number of trainable parameters, making it more suitable for low-resource settings.
翻译:最近,在低流流中利用以培训前的变异器为基础的语言模型,任务特定模型在自然语言理解任务方面提高了最新水平。然而,只有少量研究探索了这一方法在低资源环境中是否适合使用不到1,000个培训数据点。在这项工作中,我们探索了BERT的微调方法 -- -- 一个以培训前的变异器为基础的语言模型 -- -- 利用以人才库为基础的积极学习来加快培训,同时保持新数据标签的成本不变。我们在GLUE数据集上的实验结果表明,通过在从无标签数据库查询时最大限度地利用该模型获得的近似知识,在模型的绩效方面具有优势。最后,我们展示和分析了语言模型冻结层在微调中的好处,以减少可培训参数的数量,使之更适合低资源环境。