Fine-tuning pre-trained language models has recently become a common practice in building NLP models for various tasks, especially few-shot tasks. We argue that under the few-shot setting, formulating fine-tuning closer to the pre-training objectives shall be able to unleash more benefits from the pre-trained language models. In this work, we take few-shot named entity recognition (NER) for a pilot study, where existing fine-tuning strategies are much different from pre-training. We propose a novel few-shot fine-tuning framework for NER, FFF-NER. Specifically, we introduce three new types of tokens, "is-entity", "which-type" and bracket, so we can formulate the NER fine-tuning as (masked) token prediction or generation, depending on the choice of pre-trained language models. In our experiments, we apply FFF-NER to fine-tune both BERT and BART for few-shot NER on several benchmark datasets and observe significant improvements over existing fine-tuning strategies, including sequence labeling, prototype meta-learning, and prompt-based approaches. We further perform a series of ablation studies, showing few-shot NER performance is strongly correlated with the similarity between fine-tuning and pre-training.
翻译:在为各种任务,特别是少数任务建立NLP模型方面,经过训练前的语文模型的微调最近已成为一种常见做法。我们争辩说,在微小的设置下,与训练前的目标更接近的微调应能从经过训练前的语言模型中产生更多的好处。在这项工作中,我们将几个称为实体的识别(NER)用于一项试点研究,而现有的微调战略与培训前的微调战略大不相同。我们建议为NER(FFF-NER)建立一个新的微调框架,略微微调整框架。具体地说,我们引入了三种新型的象征性、“真实性”、“类型”和括号,这样我们就可以根据经过训练前的语言模型的选择,将净化净化的微调作为(大规模)象征性)的预测或代代代代相传。在我们的实验中,我们应用FFF-NER(F-NER)来对几个基准数据集进行微调的微调整,并观察现有微调战略的重大改进,包括序列标签、原型元学习和快速的精确方法。我们进一步进行一系列类似的升级研究。