基于采样的推理：基础模型比你想象的更聪明 (Reasoning with Sampling: Your Base Model is Smarter Than You Think)

Frontier reasoning models have exhibited incredible capabilities across a wide array of disciplines, driven by posttraining large language models (LLMs) with reinforcement learning (RL). However, despite the widespread success of this paradigm, much of the literature has been devoted to disentangling truly novel behaviors that emerge during RL but are not present in the base models. In our work, we approach this question from a different angle, instead asking whether comparable reasoning capabilites can be elicited from base models at inference time by pure sampling, without any additional training. Inspired by Markov chain Monte Carlo (MCMC) techniques for sampling from sharpened distributions, we propose a simple iterative sampling algorithm leveraging the base models' own likelihoods. Over different base models, we show that our algorithm offers substantial boosts in reasoning that nearly match and even outperform those from RL on a wide variety of single-shot tasks, including MATH500, HumanEval, and GPQA. Moreover, our sampler avoids the collapse in diversity over multiple samples that is characteristic of RL-posttraining. Crucially, our method does not require training, curated datasets, or a verifier, suggesting broad applicability beyond easily verifiable domains.

翻译：前沿推理模型在强化学习（RL）对大型语言模型（LLM）进行后训练的驱动下，已在众多学科领域展现出惊人的能力。然而，尽管这一范式取得了广泛成功，现有文献大多致力于厘清那些在RL过程中出现、但基础模型本身并不具备的真正新行为。在本研究中，我们从一个不同的角度探讨这一问题：我们追问，是否能够仅通过纯采样在推理阶段从基础模型中激发出可比的推理能力，而无需任何额外训练。受从锐化分布中采样的马尔可夫链蒙特卡洛（MCMC）技术启发，我们提出了一种利用基础模型自身似然的简单迭代采样算法。在不同基础模型上的实验表明，我们的算法在多种单次任务（包括MATH500、HumanEval和GPQA）上带来了显著的推理能力提升，其效果近乎匹配甚至超越RL后训练的结果。此外，我们的采样器避免了RL后训练在多样本上常见的多样性崩溃问题。关键在于，我们的方法无需训练、精选数据集或验证器，这表明其可广泛应用于易于验证领域之外的场景。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日