最小-最大提示策略使大型语言模型具备复杂推理能力 (Least-to-Most Prompting Enables Complex Reasoning in Large Language Models) - 专知论文

会员服务 ·

0

复杂推理 · 示例 · 泛化 · 大型语言模型 · 划分 ·

2023 年 4 月 16 日

Least-to-Most Prompting Enables Complex Reasoning in Large Language Models

翻译：最小-最大提示策略使大型语言模型具备复杂推理能力

Denny Zhou,Nathanael Schärli,Le Hou,Jason Wei,Nathan Scales,Xuezhi Wang,Dale Schuurmans,Claire Cui,Olivier Bousquet,Quoc Le,Ed Chi

from arxiv, ICLR 2023

Chain-of-thought prompting has demonstrated remarkable performance on various natural language reasoning tasks. However, it tends to perform poorly on tasks which requires solving problems harder than the exemplars shown in the prompts. To overcome this challenge of easy-to-hard generalization, we propose a novel prompting strategy, least-to-most prompting. The key idea in this strategy is to break down a complex problem into a series of simpler subproblems and then solve them in sequence. Solving each subproblem is facilitated by the answers to previously solved subproblems. Our experimental results on tasks related to symbolic manipulation, compositional generalization, and math reasoning reveal that least-to-most prompting is capable of generalizing to more difficult problems than those seen in the prompts. A notable finding is that when the GPT-3 code-davinci-002 model is used with least-to-most prompting, it can solve the compositional generalization benchmark SCAN in any split (including length split) with an accuracy of at least 99% using just 14 exemplars, compared to only 16% accuracy with chain-of-thought prompting. This is particularly noteworthy because neural-symbolic models in the literature that specialize in solving SCAN are trained on the entire training set containing over 15,000 examples. We have included prompts for all the tasks in the Appendix.

翻译：链式提示已经在各种自然语言推理任务中展现出卓越的表现。然而，它倾向于在需要解决比提示中示例更难的问题时表现糟糕。为了克服easy-to-hard泛化的挑战，我们提出了一种新的提示策略：最小-最大提示策略。该策略的关键思路是将一个复杂问题分解成一系列较简单的子问题，然后按顺序解决它们。每个子问题的解决都会受到以前解决的子问题答案的帮助。我们在涉及符号操作，组合推理和数学推理的任务上的实验结果表明，最小-最大提示是能够泛化到比提示中看到的更困难的问题。一个值得注意的发现是，当使用GPT-3 code-davinci-002模型和最小-最大提示时，仅使用14个示例就能解决组合泛化基准测试SCAN的任何划分（包括长度划分），而链式提示仅能达到16％的准确率。这尤其值得注意，因为文献中的神经符号模型是针对整个训练集中的超过15,000个示例进行训练的。附录中提供了所有任务的提示。

0

相关内容

复杂推理

【ICML2023】调整语言模型作为增强少样本学习的训练数据生成器

【ICML2023】调整语言模型作为增强少样本学习的训练数据生成器

专知会员服务

32+阅读 · 2023年5月19日

历时2年442位作者132个机构！Google发布语言模型评价新基准BIG-bench，204个任务全面评价大语言模型的能力

历时2年442位作者132个机构！Google发布语言模型评价新基准BIG-bench，204个任务全面评价大语言模型的能力

专知会员服务

20+阅读 · 2022年6月10日

【香港科技大学等】视觉-语言智能:任务、表示学习和大模型，Vision-Language Intelligence: Tasks, Representation Learning, and Large Models

【香港科技大学等】视觉-语言智能:任务、表示学习和大模型，Vision-Language Intelligence: Tasks, Representation Learning, and Large Models

专知会员服务

44+阅读 · 2022年3月8日

【USC2021】常识推理，47页ppt，Commonsense Reasoning in the Wild

专知会员服务

33+阅读 · 2021年10月9日

【GPT-3作者亲解】超大型语言模型少样本学习，109页ppt

【GPT-3作者亲解】超大型语言模型少样本学习，109页ppt

专知会员服务

109+阅读 · 2020年12月19日

最新《知识图谱复杂问答》综述论文，A Survey on Complex Question Answering over Knowledge Base: Recent Advances and Challenges

最新《知识图谱复杂问答》综述论文，A Survey on Complex Question Answering over Knowledge Base: Recent Advances and Challenges

专知会员服务

73+阅读 · 2020年7月28日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【ACL2020】Span-ConveRT：预训练对话表示小样本跨度提取，Span-ConveRT: Few-shot Span Extraction for Dialog with Pretrained Conversational Representations

【ACL2020】Span-ConveRT：预训练对话表示小样本跨度提取，Span-ConveRT: Few-shot Span Extraction for Dialog with Pretrained Conversational Representations

专知会员服务

17+阅读 · 2020年5月19日

【Google 大脑】使用上千个优化任务学习超参数搜索策略，Using a thousand optimization tasks to learn hyperparameter search strategies

【Google 大脑】使用上千个优化任务学习超参数搜索策略，Using a thousand optimization tasks to learn hyperparameter search strategies

专知会员服务

18+阅读 · 2020年3月14日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

GNN 新基准！Long Range Graph Benchmark

GNN 新基准！Long Range Graph Benchmark

图与推荐

0+阅读 · 2022年10月18日

历时2年442位作者132个机构！Google发布语言模型评价新基准BIG-bench，204个任务全面评价语言模型能力，附论文

历时2年442位作者132个机构！Google发布语言模型评价新基准BIG-bench，204个任务全面评价语言模型能力，附论文

专知

0+阅读 · 2022年6月10日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇自动问答（QA）相关论文—复杂序列问答、注意力机制、长短时记忆、文本推理、多因素注意力、主动的问答智能体

【论文推荐】最新六篇自动问答（QA）相关论文—复杂序列问答、注意力机制、长短时记忆、文本推理、多因素注意力、主动的问答智能体

专知

18+阅读 · 2018年2月22日

C/EBP-α蛋白赖氨酸乙酰化调控及其促进激活的HSCs凋亡的作用和分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

硅调控特定基因提高水稻修复UV-B伤害的分子机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

ASICs在肿瘤酸化微环境中对MDSCs抑制免疫活性的影响及其机制的研究

国家自然科学基金

0+阅读 · 2012年12月31日

液相费托合成反应的选择性调控新策略

国家自然科学基金

0+阅读 · 2012年12月31日

FOSL1基因修饰骨髓间充质干细胞优化组织工程肝脏仿生血管网络的构建

国家自然科学基金

0+阅读 · 2012年12月31日

负性调控骨髓间充质干细胞的FAPα- - 增强多发性骨髓瘤疫苗抗瘤效应的新策略

国家自然科学基金

0+阅读 · 2012年12月31日

Arisandilactone A 的不对称全合成

国家自然科学基金

0+阅读 · 2012年12月31日

去乙酰化酶抑制剂VPA对脂肪间充质干细胞免疫调控能力的影响

国家自然科学基金

0+阅读 · 2012年12月31日

巨噬细胞在肥胖导致骨骼肌胰岛素抵抗中的作用及肌肉因子逆转其作用的机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

UGT基因簇进化及调控研究

国家自然科学基金

0+阅读 · 2009年12月31日

Revisiting Event Argument Extraction: Can EAE Models Learn Better When Being Aware of Event Co-occurrences?

Arxiv

0+阅读 · 2023年6月1日

How to Unleash the Power of Large Language Models for Few-shot Relation Extraction?

Arxiv

0+阅读 · 2023年6月1日

Towards Understanding Chain-of-Thought Prompting: An Empirical Study of What Matters

Arxiv

0+阅读 · 2023年6月1日

Reasoning with Language Model Prompting: A Survey

Arxiv

0+阅读 · 2023年5月31日

Strategic Reasoning with Language Models

Arxiv

0+阅读 · 2023年5月30日

Prompting Is Programming: A Query Language for Large Language Models

Arxiv

0+阅读 · 2023年5月30日

Iterative Forward Tuning Boosts In-context Learning in Language Models

Arxiv

0+阅读 · 2023年5月30日

GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction

Arxiv

1+阅读 · 2023年5月30日

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Arxiv

30+阅读 · 2021年7月28日

Commonsense Reasoning for Natural Language Understanding: A Survey of Benchmarks, Resources, and Approaches

Arxiv

16+阅读 · 2019年4月2日

VIP会员

文章信息

相关主题

大型语言模型

相关VIP内容

【ICML2023】调整语言模型作为增强少样本学习的训练数据生成器

【ICML2023】调整语言模型作为增强少样本学习的训练数据生成器

专知会员服务

32+阅读 · 2023年5月19日

历时2年442位作者132个机构！Google发布语言模型评价新基准BIG-bench，204个任务全面评价大语言模型的能力

历时2年442位作者132个机构！Google发布语言模型评价新基准BIG-bench，204个任务全面评价大语言模型的能力

专知会员服务

20+阅读 · 2022年6月10日

【香港科技大学等】视觉-语言智能:任务、表示学习和大模型，Vision-Language Intelligence: Tasks, Representation Learning, and Large Models

【香港科技大学等】视觉-语言智能:任务、表示学习和大模型，Vision-Language Intelligence: Tasks, Representation Learning, and Large Models

专知会员服务

44+阅读 · 2022年3月8日

【USC2021】常识推理，47页ppt，Commonsense Reasoning in the Wild

专知会员服务

33+阅读 · 2021年10月9日

【GPT-3作者亲解】超大型语言模型少样本学习，109页ppt

【GPT-3作者亲解】超大型语言模型少样本学习，109页ppt

专知会员服务

109+阅读 · 2020年12月19日

最新《知识图谱复杂问答》综述论文，A Survey on Complex Question Answering over Knowledge Base: Recent Advances and Challenges

最新《知识图谱复杂问答》综述论文，A Survey on Complex Question Answering over Knowledge Base: Recent Advances and Challenges

专知会员服务

73+阅读 · 2020年7月28日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【ACL2020】Span-ConveRT：预训练对话表示小样本跨度提取，Span-ConveRT: Few-shot Span Extraction for Dialog with Pretrained Conversational Representations

【ACL2020】Span-ConveRT：预训练对话表示小样本跨度提取，Span-ConveRT: Few-shot Span Extraction for Dialog with Pretrained Conversational Representations

专知会员服务

17+阅读 · 2020年5月19日

【Google 大脑】使用上千个优化任务学习超参数搜索策略，Using a thousand optimization tasks to learn hyperparameter search strategies

【Google 大脑】使用上千个优化任务学习超参数搜索策略，Using a thousand optimization tasks to learn hyperparameter search strategies

专知会员服务

18+阅读 · 2020年3月14日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

《多智能体不确定环境追逃博弈研究》216页

美智库最新发布《解放军"人机编组协同作战"发展路径：理论与实践》53页

现代战争"杀伤区"理论：空间尺度与结构特征、控制手段与毁伤机制、生存策略与战线转移

《俄军无人机创新技术或已在乌克兰达成"战场空中封锁"作战效果》最新18页报告

相关资讯

GNN 新基准！Long Range Graph Benchmark

GNN 新基准！Long Range Graph Benchmark

图与推荐

0+阅读 · 2022年10月18日

历时2年442位作者132个机构！Google发布语言模型评价新基准BIG-bench，204个任务全面评价语言模型能力，附论文

历时2年442位作者132个机构！Google发布语言模型评价新基准BIG-bench，204个任务全面评价语言模型能力，附论文

专知

0+阅读 · 2022年6月10日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇自动问答（QA）相关论文—复杂序列问答、注意力机制、长短时记忆、文本推理、多因素注意力、主动的问答智能体

【论文推荐】最新六篇自动问答（QA）相关论文—复杂序列问答、注意力机制、长短时记忆、文本推理、多因素注意力、主动的问答智能体

专知

18+阅读 · 2018年2月22日

相关论文

Revisiting Event Argument Extraction: Can EAE Models Learn Better When Being Aware of Event Co-occurrences?

Arxiv

0+阅读 · 2023年6月1日

How to Unleash the Power of Large Language Models for Few-shot Relation Extraction?

Arxiv

0+阅读 · 2023年6月1日

Towards Understanding Chain-of-Thought Prompting: An Empirical Study of What Matters

Arxiv

0+阅读 · 2023年6月1日

Reasoning with Language Model Prompting: A Survey

Arxiv

0+阅读 · 2023年5月31日

Strategic Reasoning with Language Models

Arxiv

0+阅读 · 2023年5月30日

Prompting Is Programming: A Query Language for Large Language Models

Arxiv

0+阅读 · 2023年5月30日

Iterative Forward Tuning Boosts In-context Learning in Language Models

Arxiv

0+阅读 · 2023年5月30日

GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction

Arxiv

1+阅读 · 2023年5月30日

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Arxiv

30+阅读 · 2021年7月28日

Commonsense Reasoning for Natural Language Understanding: A Survey of Benchmarks, Resources, and Approaches

Arxiv

16+阅读 · 2019年4月2日

相关基金

C/EBP-α蛋白赖氨酸乙酰化调控及其促进激活的HSCs凋亡的作用和分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

硅调控特定基因提高水稻修复UV-B伤害的分子机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

ASICs在肿瘤酸化微环境中对MDSCs抑制免疫活性的影响及其机制的研究

国家自然科学基金

0+阅读 · 2012年12月31日

液相费托合成反应的选择性调控新策略

国家自然科学基金

0+阅读 · 2012年12月31日

FOSL1基因修饰骨髓间充质干细胞优化组织工程肝脏仿生血管网络的构建

国家自然科学基金

0+阅读 · 2012年12月31日

负性调控骨髓间充质干细胞的FAPα- - 增强多发性骨髓瘤疫苗抗瘤效应的新策略

国家自然科学基金

0+阅读 · 2012年12月31日

Arisandilactone A 的不对称全合成

国家自然科学基金

0+阅读 · 2012年12月31日

去乙酰化酶抑制剂VPA对脂肪间充质干细胞免疫调控能力的影响

国家自然科学基金

0+阅读 · 2012年12月31日

巨噬细胞在肥胖导致骨骼肌胰岛素抵抗中的作用及肌肉因子逆转其作用的机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

UGT基因簇进化及调控研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员