Multi-hop Question Generation is the task of generating questions which require the reader to reason over and combine information spread across multiple passages using several reasoning steps. Chain-of-thought rationale generation has been shown to improve performance on multi-step reasoning tasks and make model predictions more interpretable. However, few-shot performance gains from including rationales have been largely observed only in +100B language models, and otherwise require large scale manual rationale annotation. In this work, we introduce a new framework for applying chain-of-thought inspired structured rationale generation to multi-hop question generation under a very low supervision regime (8- to 128-shot). We propose to annotate a small number of examples following our proposed multi-step rationale schema, treating each reasoning step as a separate task to be performed by a generative language model. We show that our framework leads to improved control over the difficulty of the generated questions and better performance compared to baselines trained without rationales, both on automatic evaluation metrics and in human evaluation. Importantly, we show that this is achievable with a modest model size.
翻译:多跳问题生成是产生问题的任务,要求读者用几个推理步骤理解并综合多种段落中散布的信息; 集思广益集思广益的生成已证明可以改进多步推理任务的业绩,并使模型预测更便于解释; 然而,在+100B语言模型中基本上观察到了纳入理由的微小绩效收益,而在其他情况下则需要大规模人工说明。 在这项工作中,我们引入了一个新的框架,在非常低的监管制度下,对多跳问题生成者应用深思熟虑的结构性理由生成(8-128-shot ) 。 我们提议按照我们提议的多步推理原理规划,将每个推理步骤视为一个单独的任务,由基因化语言模型执行。我们表明,我们的框架可以改善对所产生问题的难度的控制,提高绩效,与在没有理由的情况下培训的基线相比,在自动评价指标和人类评价方面,我们表明,这在规模不大的情况下是可以实现的。