Neural language representation models such as BERT pre-trained on large-scale corpora can well capture rich semantic patterns from plain text, and be fine-tuned to consistently improve the performance of various NLP tasks. However, the existing pre-trained language models rarely consider incorporating knowledge graphs (KGs), which can provide rich structured knowledge facts for better language understanding. We argue that informative entities in KGs can enhance language representation with external knowledge. In this paper, we utilize both large-scale textual corpora and KGs to train an enhanced language representation model (ERNIE), which can take full advantage of lexical, syntactic, and knowledge information simultaneously. The experimental results have demonstrated that ERNIE achieves significant improvements on various knowledge-driven tasks, and meanwhile is comparable with the state-of-the-art model BERT on other common NLP tasks. The source code of this paper can be obtained from https://github.com/thunlp/ERNIE.
翻译:在大规模集体体上经过预先训练的BERT等神经语言代表模式可以很好地从纯文本中捕捉到丰富的语义模式,并进行微调,以不断改进各种非语言语言项目任务的执行情况。然而,现有的经训练前语言模式很少考虑纳入知识图(KGs),这种图解可以提供丰富的结构化知识事实,以更好地理解语言。我们争辩说,KGs中的信息实体可以利用外部知识加强语言代表。在本文件中,我们利用大型文本公司和KGs来培训强化语言代表模式(ERNIE),这可以同时充分利用词汇、合成和知识信息。实验结果表明,ERNIE在各种知识驱动的任务上取得了重大改进,同时与其他通用非语言项目任务的最新模型BERT具有可比性。这份文件的源代码可以从 https://github.com/thunlp/ERNIE中获取。