When a large language model (LLM) performs complex reasoning by chain of thought (CoT), it can be highly sensitive to individual mistakes. We have had to train verifiers to address this issue. As we all know, after human inferring a conclusion, they often check it by re-verifying it, which can avoid some mistakes. We propose a new method called self-verification that uses the conclusion of the CoT as a condition to build a new sample and asks the LLM to re-predict the original conditions which be masked. We calculate an explainable verification score based on the accuracy. This method can improve the accuracy of multiple arithmetics and logical reasoning datasets when using few-shot learning. we have demonstrated that LLMs can conduct explainable self-verification of their own conclusions and achieve competitive reasoning performance. Extensive experimentals have demonstrated that our method can help multiple large language models with self-verification can avoid interference from incorrect CoT. Code is available at \url{https://github.com/WENGSYX/Self-Verification}
翻译:当一个大型语言模型(LLM)通过思维链进行复杂的推理时,它可能会对个别错误高度敏感。我们不得不训练核查员来解决这个问题。正如我们都知道的那样,在人类推算一个结论之后,他们往往通过重新核查来检查它,这样可以避免一些错误。我们提议了一种称为自我核查的新方法,将CoT的结论作为建立新样本的条件,并要求LLM重新预测所掩盖的原始条件。我们根据准确性计算了一个可解释的核查分。这种方法在使用少数的学习时可以提高多种计算和逻辑推理数据集的准确性。我们已经证明LLMS可以解释自己的结论并取得竞争性推理性性性能。广泛的实验表明,我们的方法可以帮助多个大型语言模型进行自我核查,可以避免不正确的CT。代码可在以下网址查阅:https://github.com/WENGSYX/self-Verificification}