The diversity and Zipfian frequency distribution of natural language predicates in corpora leads to sparsity when learning Entailment Graphs. As symbolic models for natural language inference, an EG cannot recover if missing a novel premise or hypothesis at test-time. In this paper we approach the problem of vertex sparsity by introducing a new method of graph smoothing, using a Language Model to find the nearest approximations of missing predicates. We improve recall by 25.1 and 16.3 absolute percentage points on two difficult directional entailment datasets while exceeding average precision, and show a complementarity with other improvements to edge sparsity. We further analyze language model embeddings and discuss why they are naturally suitable for premise-smoothing, but not hypothesis-smoothing. Finally, we formalize a theory for smoothing a symbolic inference method by constructing transitive chains to smooth both the premise and hypothesis.
翻译:公司内自然语言上游的多样化和Zipfian频率分布导致在学习成份图时的宽度。作为自然语言推断的象征性模型,如果在测试时缺少一个新的前提或假设,EG就无法恢复。在本文中,我们通过采用一种新的图解平滑方法,使用语言模型寻找缺失的上游最接近的近似值,解决顶点问题。我们改进了两个困难的方向隐含数据集的回溯率25.1和16.3绝对百分点,同时超过了平均精确度,并显示出与边缘偏移的其他改进的互补性。我们进一步分析语言模型嵌入,并讨论为什么它们自然适合前置间间间滑动,而不是假设间移。最后,我们正式形成了一种理论,通过构建过渡链来平滑前提和假设,来平滑象征性推导法。