In this work, we present an end-to-end Knowledge Graph Question Answering (KGQA) system named GETT-QA. GETT-QA uses T5, a popular text-to-text pre-trained language model. The model takes a question in natural language as input and produces a simpler form of the intended SPARQL query. In the simpler form, the model does not directly produce entity and relation IDs. Instead, it produces corresponding entity and relation labels. The labels are grounded to KG entity and relation IDs in a subsequent step. To further improve the results, we instruct the model to produce a truncated version of the KG embedding for each entity. The truncated KG embedding enables a finer search for disambiguation purposes. We find that T5 is able to learn the truncated KG embeddings without any change of loss function, improving KGQA performance. As a result, we report strong results for LC-QuAD 2.0 and SimpleQuestions-Wikidata datasets on end-to-end KGQA over Wikidata.
翻译:在本工作中,我们提出了一个端到端的知识图谱问答系统 GETT-QA。GETT-QA 使用 T5,这是一种流行的文本到文本预训练语言模型。该模型接受自然语言问句作为输入,并生成所需 SPARQL 查询的简化形式。在简化形式中,该模型不直接生成实体和关系 ID,而是生成相应的实体和关系标签。这些标签在后续步骤中被映射到知识图谱实体和关系 ID 上。为了进一步提高结果,我们指示模型生成每个实体的 KG 嵌入的截断版本。截断的 KG 嵌入使得可以对实体进行更精细的搜索以进行消岐。我们发现 T5 能够学习到截断的 KG 嵌入,而无需更改损失函数,从而提高 KGQA 性能。因此,我们针对 LC-QuAD 2.0 和 SimpleQuestions-Wikidata 数据集在 Wikidata 上进行了端到端 KGQA,获得了强大的结果。