Existing approaches on Question Answering over Knowledge Graphs (KGQA) have weak generalizability. That is often due to the standard i.i.d. assumption on the underlying dataset. Recently, three levels of generalization for KGQA were defined, namely i.i.d., compositional, zero-shot. We analyze 25 well-known KGQA datasets for 5 different Knowledge Graphs (KGs). We show that according to this definition many existing and online available KGQA datasets are either not suited to train a generalizable KGQA system or that the datasets are based on discontinued and out-dated KGs. Generating new datasets is a costly process and, thus, is not an alternative to smaller research groups and companies. In this work, we propose a mitigation method for re-splitting available KGQA datasets to enable their applicability to evaluate generalization, without any cost and manual effort. We test our hypothesis on three KGQA datasets, i.e., LC-QuAD, LC-QuAD 2.0 and QALD-9). Experiments on re-splitted KGQA datasets demonstrate its effectiveness towards generalizability. The code and a unified way to access 18 available datasets is online at https://github.com/semantic-systems/KGQA-datasets as well as https://github.com/semantic-systems/KGQA-datasets-generalization.
翻译:关于知识图解问题的现有方法(KGQA)的可概括性不强,这往往是由于基础数据集的标准假设(i.i.d.d.)造成的。最近,对知识图解的现有方法(KGQA)的概括性定义了三个层次,即i.d.,构成,零点。我们分析了5种不同的知识图(KGGA)的25个众所周知的KGQA数据集。我们显示,根据这个定义,许多现有的和在线可用的KGQA数据集不是适合培训通用的KGQA系统,或数据集基于中断和过时的KGs。生成新的数据集是一个昂贵的过程,因此,不能替代较小的研究组和公司。我们在此工作中,我们提出了一个缓解方法,用于将现有的KGQA数据集(KGQA)重新组合,以便能够在不花费任何成本和人工努力的情况下,评价其通用的通用数据。我们测试了三个 KGQA数据集的假设,即, LC-Qs-QADA, LC-Q-Qs-QalianalityAdalityAdal-ality-alityADADAs,在18 Gality-allittal-al-ality-Q-ality-ality-qs),作为通用数据规则,作为通用数据规则,作为通用的可提供。