The ability to understand and generate similes is an imperative step to realize human-level AI. However, there is still a considerable gap between machine intelligence and human cognition in similes, since deep models based on statistical distribution tend to favour high-frequency similes. Hence, a large-scale symbolic knowledge base of similes is required, as it contributes to the modeling of diverse yet unpopular similes while facilitating additional evaluation and reasoning. To bridge the gap, we propose a novel framework for large-scale simile knowledge base construction, as well as two probabilistic metrics which enable an improved understanding of simile phenomena in natural language. Overall, we construct MAPS-KB, a million-scale probabilistic simile knowledge base, covering 4.3 million triplets over 0.4 million terms from 70 GB corpora. We conduct sufficient experiments to justify the effectiveness and necessity of the methods of our framework. We also apply MAPS-KB on three downstream tasks to achieve state-of-the-art performance, further demonstrating the value of MAPS-KB.
翻译:然而,由于基于统计分布的深模型往往偏向于高频类模,因此,需要大规模模拟的象征性知识库,因为它有助于建模多样化但不受欢迎的类模,同时促进进一步的评估和推理。为了弥合这一差距,我们提议为大规模硅知识库的建设建立一个新颖的框架,以及两个能够增进了解自然语言硅现象的概率度量标准,因为基于统计分布的深模型往往偏向于高频类模;因此,需要大规模模拟的象征性知识库,因为它有助于建模多样化但又不受欢迎的类模,同时促进进一步的评估和推理。为了缩小这一差距,我们还提议为大规模硅知识库的构建建立一个新型框架,以及两个能够增进了解天然语言中硅现象的概率度量标准。 总体而言,我们建造了MAPS-KB,这是一个百万级的概率类比知识库,覆盖了70GB星体的430万个三长的3百万个条件,我们进行了充分的实验,以证明我们框架方法的有效性和必要性。我们还将MAPS-KB应用于三项下游任务,以进一步展示MAPS-KB的价值。