大型语言模型具备自验证能力的推理器 (Large Language Models are reasoners with Self-Verification) - 专知论文

会员服务 ·

0

CoT · 大型语言模型 · 语言模型 · 复杂推理 · 样本 ·

2023 年 3 月 29 日

Large Language Models are reasoners with Self-Verification

翻译：大型语言模型具备自验证能力的推理器

Yixuan Weng,Minjun Zhu,Fei Xia,Bin Li,Shizhu He,Kang Liu,Jun Zhao

When a large language model (LLM) performs complex reasoning by chain of thought (CoT), it can be highly sensitive to individual mistakes. We have had to train verifiers to address this issue. As we all know, after human inferring a conclusion, they often check it by re-verifying it, which can avoid some mistakes. We propose a new method called self-verification that uses the conclusion of the CoT as a condition to build a new sample and asks the LLM to re-predict the original conditions which be masked. We calculate an explainable verification score based on the accuracy. This method can improve the accuracy of multiple arithmetics and logical reasoning datasets when using few-shot learning. we have demonstrated that LLMs can conduct explainable self-verification of their own conclusions and achieve competitive reasoning performance. Extensive experimentals have demonstrated that our method can help multiple large language models with self-verification can avoid interference from incorrect CoT. Code is available at \url{https://github.com/WENGSYX/Self-Verification}

翻译：当一个大型语言模型（LLM）通过思维链（CoT）进行复杂推理时，它可能对个别错误非常敏感。我们需要训练验证器来解决这个问题。众所周知，人类在推断出结论后，通常会通过重新验证来检查它，这可以避免一些错误。我们提出了一种新方法，称为自验证，它使用CoT的结论作为条件来构建一个新样本，并要求LLM重新预测被屏蔽的原始条件。我们基于准确性计算可解释的验证分数。在使用少量样本学习时，此方法可以提高多个算术和逻辑推理数据集的准确性。我们已经证明LLMs可以对自己的结论进行可解释的自验证，并实现了有竞争力的推理性能。大量实验证明，我们的方法可以帮助多个LLM具备自验证能力，避免受到不正确的CoT的干扰。代码可在 \url{https://github.com/WENGSYX/Self-Verification} 中获得。

0

相关内容

CoT

大模型的涌现能力介绍

大模型的涌现能力介绍

专知会员服务

173+阅读 · 2023年5月16日

大模型如何端边部署？华盛顿Google提出《逐步蒸馏》法，以更少的训练数据和更小的模型规模超越更大的语言模型

大模型如何端边部署？华盛顿Google提出《逐步蒸馏》法，以更少的训练数据和更小的模型规模超越更大的语言模型

专知会员服务

78+阅读 · 2023年5月8日

百篇论文纵览大型语言模型最新研究进展

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

《大型语言模型》最新报告，52页ppt，DeepMind Angeliki Lazaridou

《大型语言模型》最新报告，52页ppt，DeepMind Angeliki Lazaridou

专知会员服务

67+阅读 · 2022年9月17日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【ACL2020】不要停止预训练:根据领域和任务自适应调整语言模型，Don't Stop Pretraining: Adapt Language Models to Domains and Tasks

【ACL2020】不要停止预训练:根据领域和任务自适应调整语言模型，Don't Stop Pretraining: Adapt Language Models to Domains and Tasks

专知会员服务

46+阅读 · 2020年4月25日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

专知会员服务

24+阅读 · 2019年11月4日

【CCL 2019】如何微调BERT进行文本分类？（How to Fine-Tune BERT for Text Classification?）

【CCL 2019】如何微调BERT进行文本分类？（How to Fine-Tune BERT for Text Classification?）

专知会员服务

84+阅读 · 2019年10月18日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

GNN 新基准！Long Range Graph Benchmark

GNN 新基准！Long Range Graph Benchmark

图与推荐

0+阅读 · 2022年10月18日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知

16+阅读 · 2020年5月31日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】图像分类必读开创性论文汇总

【推荐】图像分类必读开创性论文汇总

机器学习研究会

14+阅读 · 2017年8月15日

三维裂隙岩体水力耦合多尺度分析相场模型研究

国家自然科学基金

0+阅读 · 2014年12月31日

不同海洋风场环境与卫星成像条件下SAR海上目标探测能力仿真研究

国家自然科学基金

5+阅读 · 2014年12月31日

长链非编码RNA CAR intergenic 10在细胞衰老中的作用和机制

国家自然科学基金

1+阅读 · 2013年12月31日

三维椭圆方程Cauchy问题的正则化方法

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

个体化医学中生物标记物预测能力的估计和推断

国家自然科学基金

2+阅读 · 2013年12月31日

茅口灰岩隐伏承压溶洞围岩时效破断—滞后突水机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

多元代数插值的计算机数学方法

国家自然科学基金

1+阅读 · 2011年12月31日

基于新模型的低渗透油藏非线性渗流理论

国家自然科学基金

0+阅读 · 2011年12月31日

具有在线智能识别能力的互联网应用行为区分模型

国家自然科学基金

0+阅读 · 2009年12月31日

Numeric Magnitude Comparison Effects in Large Language Models

Arxiv

0+阅读 · 2023年5月18日

Chain-of-Symbol Prompting Elicits Planning in Large Langauge Models

Arxiv

0+阅读 · 2023年5月17日

M3KE: A Massive Multi-Level Multi-Subject Knowledge Evaluation Benchmark for Chinese Large Language Models

Arxiv

0+阅读 · 2023年5月17日

StructGPT: A General Framework for Large Language Model to Reason over Structured Data

StructGPT: A General Framework for Large Language Model to Reason over Structured Data

Arxiv

0+阅读 · 2023年5月16日

Towards Expert-Level Medical Question Answering with Large Language Models

Arxiv

26+阅读 · 2023年5月16日

Benchmarks for Automated Commonsense Reasoning: A Survey

Arxiv

44+阅读 · 2023年2月22日

A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT

Arxiv

33+阅读 · 2023年2月18日

Towards Reasoning in Large Language Models: A Survey

Arxiv

34+阅读 · 2022年12月20日

Multi-Domain Multi-Task Rehearsal for Lifelong Learning

Multi-Domain Multi-Task Rehearsal for Lifelong Learning

Arxiv

12+阅读 · 2020年12月14日

Strong Baselines for Simple Question Answering over Knowledge Graphs with and without Neural Networks

Arxiv

17+阅读 · 2018年6月5日

VIP会员

文章信息

相关主题

大型语言模型

相关VIP内容

大模型的涌现能力介绍

大模型的涌现能力介绍

专知会员服务

173+阅读 · 2023年5月16日

大模型如何端边部署？华盛顿Google提出《逐步蒸馏》法，以更少的训练数据和更小的模型规模超越更大的语言模型

大模型如何端边部署？华盛顿Google提出《逐步蒸馏》法，以更少的训练数据和更小的模型规模超越更大的语言模型

专知会员服务

78+阅读 · 2023年5月8日

百篇论文纵览大型语言模型最新研究进展

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

《大型语言模型》最新报告，52页ppt，DeepMind Angeliki Lazaridou

《大型语言模型》最新报告，52页ppt，DeepMind Angeliki Lazaridou

专知会员服务

67+阅读 · 2022年9月17日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【ACL2020】不要停止预训练:根据领域和任务自适应调整语言模型，Don't Stop Pretraining: Adapt Language Models to Domains and Tasks

【ACL2020】不要停止预训练:根据领域和任务自适应调整语言模型，Don't Stop Pretraining: Adapt Language Models to Domains and Tasks

专知会员服务

46+阅读 · 2020年4月25日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

专知会员服务

24+阅读 · 2019年11月4日

【CCL 2019】如何微调BERT进行文本分类？（How to Fine-Tune BERT for Text Classification?）

【CCL 2019】如何微调BERT进行文本分类？（How to Fine-Tune BERT for Text Classification?）

专知会员服务

84+阅读 · 2019年10月18日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

操作系统智能体：基于多模态大模型（MLLM）的通用计算设备智能体综述

《美国太空军系统全生命周期建模、仿真与分析效能提升方案》最新84页报告

【博士论文】推进数据高效的深度学习：非参数 Transformer、主动测试与上下文学习

自主人工智能：未来战争是否将是自主化的？

相关资讯

GNN 新基准！Long Range Graph Benchmark

GNN 新基准！Long Range Graph Benchmark

图与推荐

0+阅读 · 2022年10月18日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知

16+阅读 · 2020年5月31日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】图像分类必读开创性论文汇总

【推荐】图像分类必读开创性论文汇总

机器学习研究会

14+阅读 · 2017年8月15日

相关论文

Numeric Magnitude Comparison Effects in Large Language Models

Arxiv

0+阅读 · 2023年5月18日

Chain-of-Symbol Prompting Elicits Planning in Large Langauge Models

Arxiv

0+阅读 · 2023年5月17日

M3KE: A Massive Multi-Level Multi-Subject Knowledge Evaluation Benchmark for Chinese Large Language Models

Arxiv

0+阅读 · 2023年5月17日

StructGPT: A General Framework for Large Language Model to Reason over Structured Data

StructGPT: A General Framework for Large Language Model to Reason over Structured Data

Arxiv

0+阅读 · 2023年5月16日

Towards Expert-Level Medical Question Answering with Large Language Models

Arxiv

26+阅读 · 2023年5月16日

Benchmarks for Automated Commonsense Reasoning: A Survey

Arxiv

44+阅读 · 2023年2月22日

A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT

Arxiv

33+阅读 · 2023年2月18日

Towards Reasoning in Large Language Models: A Survey

Arxiv

34+阅读 · 2022年12月20日

Multi-Domain Multi-Task Rehearsal for Lifelong Learning

Multi-Domain Multi-Task Rehearsal for Lifelong Learning

Arxiv

12+阅读 · 2020年12月14日

Strong Baselines for Simple Question Answering over Knowledge Graphs with and without Neural Networks

Arxiv

17+阅读 · 2018年6月5日

相关基金

三维裂隙岩体水力耦合多尺度分析相场模型研究

国家自然科学基金

0+阅读 · 2014年12月31日

不同海洋风场环境与卫星成像条件下SAR海上目标探测能力仿真研究

国家自然科学基金

5+阅读 · 2014年12月31日

长链非编码RNA CAR intergenic 10在细胞衰老中的作用和机制

国家自然科学基金

1+阅读 · 2013年12月31日

三维椭圆方程Cauchy问题的正则化方法

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

个体化医学中生物标记物预测能力的估计和推断

国家自然科学基金

2+阅读 · 2013年12月31日

茅口灰岩隐伏承压溶洞围岩时效破断—滞后突水机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

多元代数插值的计算机数学方法

国家自然科学基金

1+阅读 · 2011年12月31日

基于新模型的低渗透油藏非线性渗流理论

国家自然科学基金

0+阅读 · 2011年12月31日

具有在线智能识别能力的互联网应用行为区分模型

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员