Recent work has shown that large language models are capable of generating natural language reasoning steps or Chains-of-Thoughts (CoT) to answer a multi-step question when prompted to do so. This is insufficient, however, when the necessary knowledge is not available or up-to-date within a model's parameters. A straightforward approach to address this is to retrieve text from an external knowledge source using the question as a query and prepend it as context to the model's input. This, however, is also insufficient for multi-step QA where \textit{what to retrieve} depends on \textit{what has already been derived}. To address this issue we propose IRCoT, a new approach that interleaves retrieval with CoT for multi-step QA, guiding the retrieval with CoT and in turn using retrieved results to improve CoT. Our experiments with GPT3 show substantial improvements in retrieval (up to 22 points) and downstream QA (up to 16 points) over the baselines on four datasets: HotpotQA, 2WikiMultihopQA, MuSiQue, and IIRC. Notably, our method also works well for much smaller models such as T5-Flan-large (0.7B) without any additional training.
翻译:最近的工作表明,大型语言模型能够产生自然语言推理步骤或引力链(CoT)来回答一个多步问题。 但是,如果在模型参数范围内没有必要的知识或更新必要知识,这是不够的。 解决这一问题的一个直接的方法是利用一个问题作为查询从外部知识源检索文本,并将它作为模型输入的上下文。 然而,对于一个多步QA来说,这也是不够的,因为要检索到多步QA,而要检索到的东西取决于\ textitit{已经得出的东西}。 为了解决这个问题,我们建议IRCoT, 这是一种在多步QA上与 CoT交叉检索的新方法, 指导与 CoT的检索, 并反过来利用检索的结果来改进CoT。 我们与GPT3的实验显示,在四个数据集的基线上,在检索(最多22个点)和下游QA(最多16个点): HotpotQA, 2WikMulphopQA, QA, MuSUSUIQU) 和TI-B 等大模型上的工作有了更多的TUI-II-II-TRA。