When a large language model (LLM) performs complex reasoning by chain of thought (CoT), it can be highly sensitive to individual mistakes. We have had to train verifiers to address this issue. As we all know, after human inferring a conclusion, they often check it by re-verifying it, which can avoid some mistakes. We propose a new method called self-verification that uses the conclusion of the CoT as a condition to build a new sample and asks the LLM to re-predict the original conditions which be masked. We calculate an explainable verification score based on the accuracy. This method can improve the accuracy of multiple arithmetics and logical reasoning datasets when using few-shot learning. we have demonstrated that LLMs can conduct explainable self-verification of their own conclusions and achieve competitive reasoning performance. Extensive experimentals have demonstrated that our method can help multiple large language models with self-verification can avoid interference from incorrect CoT. Code is available at \url{https://github.com/WENGSYX/Self-Verification}
翻译:当一个大型语言模型(LLM)通过思维链(CoT)进行复杂推理时,它可能对个别错误非常敏感。我们需要训练验证器来解决这个问题。众所周知,人类在推断出结论后,通常会通过重新验证来检查它,这可以避免一些错误。我们提出了一种新方法,称为自验证,它使用CoT的结论作为条件来构建一个新样本,并要求LLM重新预测被屏蔽的原始条件。我们基于准确性计算可解释的验证分数。在使用少量样本学习时,此方法可以提高多个算术和逻辑推理数据集的准确性。我们已经证明LLMs可以对自己的结论进行可解释的自验证,并实现了有竞争力的推理性能。大量实验证明,我们的方法可以帮助多个LLM具备自验证能力,避免受到不正确的CoT的干扰。代码可在 \url{https://github.com/WENGSYX/Self-Verification} 中获得。