The ability to understand and generate languages sets human cognition apart from other known life forms'. We study a way of combing two of the most successful routes to meaning of language--statistical language models and symbolic semantics formalisms--in the task of semantic parsing. Building on a transition-based, Abstract Meaning Representation (AMR) parser, AmrEager, we explore the utility of incorporating pretrained context-aware word embeddings--such as BERT and RoBERTa--in the problem of AMR parsing, contributing a new parser we dub as AmrBerger. Experiments find these rich lexical features alone are not particularly helpful in improving the parser's overall performance as measured by the SMATCH score when compared to the non-contextual counterpart, while additional concept information empowers the system to outperform the baselines. Through lesion study, we found the use of contextual embeddings helps to make the system more robust against the removal of explicit syntactical features. These findings expose the strength and weakness of the contextual embeddings and the language models in the current form, and motivate deeper understanding thereof.
翻译:理解和生成语言的能力除了其他已知的生命形式外, 也设置了人类认知。 我们研究一种方法, 将语言- 统计语言模型和符号语义形式学的两种最成功的方法进行分类, 在语义分解任务中, 将语言- 统计语言模型和符号语义形式学的两种最成功途径进行分类。 以基于过渡的抽象含义代表( AMR) 分析师 AmerEager 为基础, 我们探索将事先经过培训的背景感知字嵌入( 如 BERT 和 RoBERTA- in AMR 解析问题) 整合起来的效用。 实验发现这些丰富的词汇特征单是用来帮助改进以 SMATCH 评分衡量的总性能, 与非文字对应者相比, 而更多的概念信息使系统能够超越基线。 通过 古列研究, 我们发现背景嵌入有助于使系统更加强大, 以清除明确的合成特征。 这些结果暴露了定位模型的强度和深度理解, 以及当前模型的动力。