Large-scale pre-trained language models (PLMs) such as BERT have recently achieved great success and become a milestone in natural language processing (NLP). It is now the consensus of the NLP community to adopt PLMs as the backbone for downstream tasks. In recent works on knowledge graph question answering (KGQA), BERT or its variants have become necessary in their KGQA models. However, there is still a lack of comprehensive research and comparison of the performance of different PLMs in KGQA. To this end, we summarize two basic KGQA frameworks based on PLMs without additional neural network modules to compare the performance of nine PLMs in terms of accuracy and efficiency. In addition, we present three benchmarks for larger-scale KGs based on the popular SimpleQuestions benchmark to investigate the scalability of PLMs. We carefully analyze the results of all PLMs-based KGQA basic frameworks on these benchmarks and two other popular datasets, WebQuestionSP and FreebaseQA, and find that knowledge distillation techniques and knowledge enhancement methods in PLMs are promising for KGQA. Furthermore, we test ChatGPT, which has drawn a great deal of attention in the NLP community, demonstrating its impressive capabilities and limitations in zero-shot KGQA. We have released the code and benchmarks to promote the use of PLMs on KGQA.
翻译:大规模预训练语言模型(PLMs)如BERT最近取得了巨大成功,成为自然语言处理(NLP)中的一个里程碑。将PLMs作为下游任务的骨干已经成为NLP界的共识。在最近的知识图谱问答(KGQA)工作中,BERT或其变种已成为KGQA模型的必要元素。然而,在KGQA中,对不同PLMs的性能缺乏全面的研究和比较。因此,我们总结了两个基于PLMs的KGQA基本框架,没有额外的神经网络模块,以比较九个PLMs在准确性和效率方面的性能。此外,我们提供了三个基于流行的SimpleQuestions基准的更大规模的KGs的基准,以研究PLMs的可扩展性。我们仔细分析了所有基于PLMs的KGQA基本框架在这些基准测试和另外两个流行数据集WebQuestionSP和FreebaseQA上的结果,并发现PLMs中的知识蒸馏技术和知识增强方法对KGQA很有前途。此外,我们测试了在NLP界引起了广泛关注的ChatGPT,在零样本KGQA中展示了其令人印象深刻的能力和局限性。我们已发布了代码和基准,以促进PLMs在KGQA中的使用。