Named Entity Recognition (NER) involves the identification and classification of named entities in unstructured text into predefined classes. NER in languages with limited resources, like French, is still an open problem due to the lack of large, robust, labelled datasets. In this paper, we propose a transformer-based NER approach for French using adversarial adaptation to similar domain or general corpora for improved feature extraction and better generalization. We evaluate our approach on three labelled datasets and show that our adaptation framework outperforms the corresponding non-adaptive models for various combinations of transformer models, source datasets and target corpora.
翻译:命名实体识别(NER)涉及将未结构化文本中的名称实体识别和分类为预先界定的类别。 由于缺乏大型、稳健、贴标签的数据集,以资源有限的语言(如法文)为单位的NER仍是一个未解决的问题。 在本文中,我们建议对法文采用基于变压器的NER方法,对类似的域或一般公司进行对抗性调整,以改进特征提取和更加概括化。我们评估了我们关于三个标签数据集的方法,并表明我们的适应框架优于各种变压器模型、源数据集和目标公司组合的相应非适应模式。