The paper presents an approach to semantic grounding of language models (LMs) that conceptualizes the LM as a conditional model generating text given a desired semantic message formalized as a set of entity-relationship triples. It embeds the LM in an auto-encoder by feeding its output to a semantic parser whose output is in the same representation domain as the input message. Compared to a baseline that generates text using greedy search, we demonstrate two techniques that improve the fluency and semantic accuracy of the generated text: The first technique samples multiple candidate text sequences from which the semantic parser chooses. The second trains the language model while keeping the semantic parser frozen to improve the semantic accuracy of the auto-encoder. We carry out experiments on the English WebNLG 3.0 data set, using BLEU to measure the fluency of generated text and standard parsing metrics to measure semantic accuracy. We show that our proposed approaches significantly improve on the greedy search baseline. Human evaluation corroborates the results of the automatic evaluation experiments.
翻译:本文介绍了一种语言模型的语义定位方法(LMS),将LM概念化为一种有条件的生成文本的模型,而该模型是一份理想的语义信息,正式成为一套实体关系三重体。它将LM嵌入一个自动编码器,将输出输入到一个语义解析器,其输出与输入信息处于相同的代表域。与利用贪婪搜索生成文本的基线相比,我们展示了两种提高生成文本的流利性和语义准确性的技术:第一种技术样本,从中选取多种候选文本序列,由语义解析器选择。第二种技术将语言模型训练成语言模型,同时将语义解解解解析器固定起来,以提高自动编码的语义准确性。我们在英语WebNLG 3.0数据集上进行了实验,利用BLEU测量生成文本的流利度和标准分辨度测量度测量语义准确性的标准测量尺度。我们提出的方法大大改进了贪婪搜索基线。人类评估证实了自动评估实验的结果。