In this work, we represent Lex-BERT, which incorporates the lexicon information into Chinese BERT for named entity recognition (NER) tasks in a natural manner. Instead of using word embeddings and a newly designed transformer layer as in FLAT, we identify the boundary of words in the sentences using special tokens, and the modified sentence will be encoded directly by BERT. Our model does not introduce any new parameters and are more efficient than FLAT. In addition, we do not require any word embeddings accompanying the lexicon collection. Experiments on Ontonotes and ZhCrossNER show that our model outperforms FLAT and other baselines.
翻译:在这项工作中,我们代表Lex-BERT, 它以自然的方式将词汇信息纳入中国实体识别(NERT)任务中。 我们不是使用字嵌入和FLAT中新设计的变压器层,而是在句子中用特殊符号识别文字的界限,修改后的句子将由BERT直接编码。 我们的模型不引入任何新参数,比FLAT更有效。 此外, 我们不需要在词汇收藏中附上任何词嵌入。 笔记和ZhCrossNER实验显示我们的模型超过了FLAT和其他基线。