The increasing scale of large language models (LLMs) brings emergent abilities to various complex tasks requiring reasoning, such as arithmetic and commonsense reasoning. It is known that the effective design of task-specific prompts is critical for LLMs' ability to produce high-quality answers. In particular, an effective approach for complex question-and-answer tasks is example-based prompting with chain-of-thought (CoT) reasoning, which significantly improves the performance of LLMs. However, current CoT methods rely on a fixed set of human-annotated exemplars, which are not necessarily the most effective examples for different tasks. This paper proposes a new method, Active-Prompt, to adapt LLMs to different tasks with task-specific example prompts (annotated with human-designed CoT reasoning). For this purpose, we propose a solution to the key problem of determining which questions are the most important and helpful ones to annotate from a pool of task-specific queries. By borrowing ideas from the related problem of uncertainty-based active learning, we introduce several metrics to characterize the uncertainty so as to select the most uncertain questions for annotation. Experimental results demonstrate the superiority of our proposed method, achieving state-of-the-art on eight complex reasoning tasks. Further analyses of different uncertainty metrics, pool sizes, zero-shot learning, and accuracy-uncertainty relationship demonstrate the effectiveness of our method. Our code will be available at https://github.com/shizhediao/active-prompt.
翻译:大型语言模型(LLMs)规模的扩大给各种需要推理的复杂任务带来了涌现的能力,如算术和常识推理等。众所周知,有效设计具体任务提示对于LLMs提供高质量答案的能力至关重要。特别是,对复杂的问答任务采取有效的方法,是以实例为基础的,通过思维链推理,大大改进LLMs的业绩。然而,目前的CT方法依赖于一套固定的具有附加说明的人类表征,这些表征不一定是不同任务的最有效例子。本文提出了一种新的方法,即“主动-Prompt”,使LLMs能够适应不同任务、特定任务实例推理(加上人设计的COT推理推理)等不同任务的能力。为此,我们提出了解决关键问题的办法,即确定哪些问题是最重要的、最有帮助的问题,以便从一个特定任务查询中注意到。通过从基于不确定性的积极学习的相关问题中汲取想法,我们采用若干指标来描述不确定性,从而选择最不确定的问题,用于说明最精确的任务的准确性。实验性分析/矩阵分析我们提出的八级推理方法。进一步展示我们的方法。</s>