Domain-general semantic parsing is a long-standing goal in natural language processing, where the semantic parser is capable of robustly parsing sentences from domains outside of which it was trained. Current approaches largely rely on additional supervision from new domains in order to generalize to those domains. We present a generative model of natural language utterances and logical forms and demonstrate its application to semantic parsing. Our approach relies on domain-independent supervision to generalize to new domains. We derive and implement efficient algorithms for training, parsing, and sentence generation. The work relies on a novel application of hierarchical Dirichlet processes (HDPs) for structured prediction, which we also present in this manuscript. This manuscript is an excerpt of chapter 4 from the Ph.D. thesis of Saparov (2022), where the model plays a central role in a larger natural language understanding system. This manuscript provides a new simplified and more complete presentation of the work first introduced in Saparov, Saraswat, and Mitchell (2017). The description and proofs of correctness of the training algorithm, parsing algorithm, and sentence generation algorithm are much simplified in this new presentation. We also describe the novel application of hierarchical Dirichlet processes for structured prediction. In addition, we extend the earlier work with a new model of word morphology, which utilizes the comprehensive morphological data from Wiktionary.
翻译:在自然语言处理中,语义剖析是一个长期的目标,语义剖析器能够从它所培训的领域外对判决进行严格区分。目前的方法主要依赖新领域的更多监督,以便推广到这些领域。我们展示了自然语言表述和逻辑形式的基因模型,并展示了其对语义剖析的应用。我们的方法依靠依赖依赖地域的监督,以推广到新的领域。我们为培训、评析和生成句子而制定并实施了高效的算法。工作依赖于对结构化预测采用等级化的Drichlet进程(HDPs)的新应用。我们在本手稿中也介绍了这一方法。本手稿是Saparov博士(2022年)第4章的摘录,该模型在更大的自然语言剖析体系中发挥着中心作用。我们的手稿为萨帕罗夫、萨拉斯瓦特和米切尔(2017年)首次引入的工程提供了新的简化和更加完整的介绍。我们从新版本的层次化分析、结构化的演算法和新版本中,我们从新版本的版本的版本中还介绍了了结构化的演算法。