In this work, we focus on the task of generating SPARQL queries from natural language questions, which can then be executed on Knowledge Graphs (KGs). We assume that gold entity and relations have been provided, and the remaining task is to arrange them in the right order along with SPARQL vocabulary, and input tokens to produce the correct SPARQL query. Pre-trained Language Models (PLMs) have not been explored in depth on this task so far, so we experiment with BART, T5 and PGNs (Pointer Generator Networks) with BERT embeddings, looking for new baselines in the PLM era for this task, on DBpedia and Wikidata KGs. We show that T5 requires special input tokenisation, but produces state of the art performance on LC-QuAD 1.0 and LC-QuAD 2.0 datasets, and outperforms task-specific models from previous works. Moreover, the methods enable semantic parsing for questions where a part of the input needs to be copied to the output query, thus enabling a new paradigm in KG semantic parsing.
翻译:在这项工作中,我们侧重于从自然语言问题中产生 SPARQL 查询的任务,然后可以在知识图(KGs)上执行。我们假设金实体和关系已经提供,剩下的任务是按照正确的顺序与 SPARQL 词汇和输入符号一起安排,以产生正确的 SPARQL 查询 。 到目前为止,尚未就这项任务深入探讨预先培训的语言模型(PLMs), 因此我们试验BART、 T5 和 PGNs (Poger 生成网络) 和 BERT 嵌入器, 寻找用于这项任务的PLM 时代、 DBpedia 和 Wikidata KGs 的新基线。 我们显示, T5 需要特殊的输入符号, 但是生成了 LC- QAD 1.0 和 LC- QAD 2. 0 数据集的艺术表现状态, 并超越了先前工作的具体任务模型 。 此外, 方法可以让问题中的一部分输入需要复制到输出查询的语义式, 从而在 KG semanticting 中产生一种新的模式 。