Synthetic high-quality multi-step reasoning data can significantly enhance the performance of large language models on various tasks. However, most existing methods rely on rejection sampling, which generates trajectories independently and suffers from inefficiency and imbalanced sampling across problems of varying difficulty. In this work, we introduce FastMCTS, an innovative data synthesis strategy inspired by Monte Carlo Tree Search. FastMCTS provides a more efficient sampling method for multi-step reasoning data, offering step-level evaluation signals and promoting balanced sampling across problems of different difficulty levels. Experiments on both English and Chinese reasoning datasets demonstrate that FastMCTS generates over 30\% more correct reasoning paths compared to rejection sampling as the number of generated tokens scales up. Furthermore, under comparable synthetic data budgets, models trained on FastMCTS-generated data outperform those trained on rejection sampling data by 3.9\% across multiple benchmarks. As a lightweight sampling strategy, FastMCTS offers a practical and efficient alternative for synthesizing high-quality reasoning data. Our code will be released soon.
翻译:合成高质量的多步推理数据能够显著提升大语言模型在各类任务上的性能。然而,现有方法大多依赖于拒绝采样,该方法独立生成推理轨迹,存在效率低下以及对不同难度问题的采样不均衡等问题。本文提出FastMCTS,一种受蒙特卡洛树搜索启发的创新数据合成策略。FastMCTS为多步推理数据提供了一种更高效的采样方法,能够提供步骤级的评估信号,并促进对不同难度问题的均衡采样。在英文和中文推理数据集上的实验表明,随着生成标记数量的增加,FastMCTS相比拒绝采样能生成超过30%的正确推理路径。此外,在可比的合成数据预算下,使用FastMCTS生成数据训练的模型在多个基准测试上的表现优于使用拒绝采样数据训练的模型,平均提升3.9%。作为一种轻量级的采样策略,FastMCTS为合成高质量推理数据提供了一个实用且高效的替代方案。我们的代码即将发布。