Textual descriptions of the physical world implicitly mention commonsense facts, while the commonsense knowledge bases explicitly represent such facts as triples. Compared to dramatically increased text data, the coverage of existing knowledge bases is far away from completion. Most of the prior studies on populating knowledge bases mainly focus on Freebase. To automatically complete commonsense knowledge bases to improve their coverage is under-explored. In this paper, we propose a new task of mining commonsense facts from the raw text that describes the physical world. We build an effective new model that fuses information from both sequence text and existing knowledge base resource. Then we create two large annotated datasets each with approximate 200k instances for commonsense knowledge base completion. Empirical results demonstrate that our model significantly outperforms baselines.
翻译:物理世界的文字描述隐含地提到了常识事实,而常识知识基础明确代表了三重事实。 与剧增的文本数据相比,现有知识基础的覆盖范围远未完成。 以往关于传播知识基础的大多数研究主要侧重于Freebase。 要自动完成常识知识基础以扩大其覆盖面,探索不足。 在本文中,我们提议一项新的任务,即从描述物理世界的原始文本中挖掘常识事实。 我们建立了一个有效的新模型,将序列文本和现有知识基础资源的信息结合在一起。 然后,我们创建了两个大型的附加说明的数据集,每个数据集约有200千例,用于完成常识知识基础的完成。 经验结果显示,我们的模型大大超过基线。