Question-answering datasets require a broad set of reasoning skills. We show how to use question decompositions to teach language models these broad reasoning skills in a robust fashion. Specifically, we use widely available QDMR representations to programmatically create hard-to-cheat synthetic contexts for real questions in six multi-step reasoning datasets. These contexts are carefully designed to avoid reasoning shortcuts prevalent in real contexts that prevent models from learning the right skills. This results in a pretraining dataset, named TeaBReaC, containing 525K multi-step questions (with associated formal programs) covering about 900 reasoning patterns. We show that pretraining standard language models (LMs) on TeaBReaC before fine-tuning them on target datasets improves their performance by up to 13 F1 points across 4 multi-step QA datasets, with up to 21 point gain on more complex questions. The resulting models also demonstrate higher robustness, with a 5-8 F1 point improvement on two contrast sets. Furthermore, TeaBReaC pretraining substantially improves model performance and robustness even when starting with numerate LMs pretrained using recent methods (e.g., PReasM, POET). Our work thus shows how to effectively use decomposition-guided contexts to robustly teach multi-step reasoning.
翻译:回答问题数据集需要一套广泛的推理技巧。 我们展示了如何使用问题分解方法来教授语言模型, 以稳健的方式教授这些广泛的推理技巧。 具体地说, 我们使用广泛可用的QDMR 演示方法, 在六个多步推理数据集中, 以方案方式为真正的问题创建硬到热的合成背景。 这些背景设计谨慎, 以避免在真实环境中普遍存在的、 阻止模型学习正确技能的捷巴里卡( TeaBReac) 的捷巴里卡( 与相关的正式程序) 预培训数据集包含525K 多步问题, 涵盖大约900个推理模式。 我们显示, 在对目标数据集进行微调之前, 对teBReaC 的标准语言模型(LMs) 进行预先培训, 提高它们的业绩, 超过13个F1点, 跨越4个多步的QA数据集, 并在更复杂的问题上获得21点的好处。 由此得出的模型也显示出更高的坚固性, 在两套对比组合上, teBReaC 预先培训了模型的性, 大大改进了模型的绩效和坚固性强性, 即使在从开始采用了我们的多步式推理学。