This study investigates the task of knowledge-based question generation (KBQG). Conventional KBQG works generated questions from fact triples in the knowledge graph, which could not express complex operations like aggregation and comparison in SPARQL. Moreover, due to the costly annotation of large-scale SPARQL-question pairs, KBQG from SPARQL under low-resource scenarios urgently needs to be explored. Recently, since the generative pre-trained language models (PLMs) typically trained in natural language (NL)-to-NL paradigm have been proven effective for low-resource generation, e.g., T5 and BART, how to effectively utilize them to generate NL-question from non-NL SPARQL is challenging. To address these challenges, AutoQGS, an auto-prompt approach for low-resource KBQG from SPARQL, is proposed. Firstly, we put forward to generate questions directly from SPARQL for the KBQG task to handle complex operations. Secondly, we propose an auto-prompter trained on large-scale unsupervised data to rephrase SPARQL into NL description, smoothing the low-resource transformation from non-NL SPARQL to NL question with PLMs. Experimental results on the WebQuestionsSP, ComlexWebQuestions 1.1, and PathQuestions show that our model achieves state-of-the-art performance, especially in low-resource settings. Furthermore, a corpus of 330k factoid complex question-SPARQL pairs is generated for further KBQG research.
翻译:本研究调查了知识型问题生成(KBQG)的任务。常规KBQG的工作由知识图中的三重事实产生问题,无法表达像在SPARQL中汇总和比较这样的复杂操作。此外,由于大规模SPARQL问题配对的批注费用昂贵,迫切需要探讨低资源情景下的SPARQL的KBQG。最近,由于通常用自然语言(NL)到NLLL标准培训的基因化预培训语言模型(PLM)模式已被证明对低资源生成有效,例如,T5和BART, 如何有效利用它们来产生非NLPARQL问题的NL问题。为了应对这些挑战,提出了AutAQGS(在资源型中为低资源型KBQG)自动推广方法。首先,我们提出从SPARQL直接产生问题,供KBQG公司处理复杂业务。第二,我们提议在高资源级的NPLMS-CRL标准中,用高级的NQQQQS-SLS-S-SLSL标准数据显示从不升级的网络数据。