Neural language models (LMs) have achieved impressive results on various language-based reasoning tasks by utilizing latent knowledge encoded in their own pretrained parameters. To make this reasoning process more explicit, recent works retrieve a rationalizing LM's internal knowledge by training or prompting it to generate free-text rationales, which can be used to guide task predictions made by either the same LM or a separate reasoning LM. However, rationalizing LMs require expensive rationale annotation and/or computation, without any assurance that their generated rationales improve LM task performance or faithfully reflect LM decision-making. In this paper, we propose PINTO, an LM pipeline that rationalizes via prompt-based learning, and learns to faithfully reason over rationales via counterfactual regularization. First, PINTO maps out a suitable reasoning process for the task input by prompting a frozen rationalizing LM to generate a free-text rationale. Second, PINTO's reasoning LM is fine-tuned to solve the task using the generated rationale as context, while regularized to output less confident predictions when the rationale is perturbed. Across four datasets, we show that PINTO significantly improves the generalization ability of the reasoning LM, yielding higher performance on both in-distribution and out-of-distribution test sets. Also, we find that PINTO's rationales are more faithful to its task predictions than those generated by competitive baselines.
翻译:语言神经模型(LMS)在各种基于语言的推理任务上取得了令人印象深刻的成果,它利用了自己经过事先培训的参数所编码的潜在知识,从而在各种语言推理任务上取得了令人印象深刻的成果。为了使这一推理过程更加明确,最近的工作通过培训或促使它产生自由文本理由,从而恢复LMM内部知识的合理化,这可以用来指导同一LM公司或单独推理LM公司的任务预测。然而,理顺LMS需要花费昂贵的理由说明和/或计算,而不能保证它们所产生的理由能够改善LM任务的业绩或忠实地反映LM决策。 本文中,我们建议PINTO公司,这是一个通过迅速学习实现合理化的LM输油管道,通过反事实规范化的正规化来学习关于LM公司内部知识的正确理由。 首先,PINTO公司绘制了任务投入的适当推理过程,通过一个冻结的合理化LM产生自由文本理由的理由。 其次,PINTO公司的推理更精确地找到了任务,用产生的理由作为背景,而同时,我们定期化为不那么,当精确的预测,当它的理由进入了基础时,在四个的轨道的推理判前的推理学中,我们要的推理,我们则显示,我们又要大大地改进了它高的推理。