Function calling (FC) empowers large language models (LLMs) and autonomous agents to interface with external tools, a critical capability for solving complex, real-world problems. As this ability becomes increasingly central to advanced AI systems, the need for high-quality, multi-turn training data to develop and refine it cannot be overstated. Existing data synthesis methods, such as random environment sampling or multi-agent role-playing, are not powerful enough to generate high-quality data in real-world environments. Practical challenges come in three folds: targeted model training, isolation of tool architecture, and multi-turn logical dependency. To address these structural deficiencies, we present FunReason-MT, a novel data synthesis framework for real-world multi-turn tool use. FunReason-MT resolves the complexity barrier in multi-turn FC data by employing 1) Environment-API Graph Interactions to gather varied high-quality trajectories, 2) Advanced Tool-Query Synthesis to simplify hard query construction, and 3) Guided Iterative Chain for sophisticated CoT generation. Evaluations on Berkeley Function-Calling Leaderboard (BFCLv3) demonstrate the power of our framework: a 4B model built upon FunReason-MT generated data achieves state-of-the-art performance among comparable-sized models, outperforming most close-source models. Further performance improvements on BFCLv4 confirm that FunReason-MT provides a reliable and robust source for agentic learning.
翻译:函数调用(FC)使大型语言模型(LLMs)和自主智能体能够与外部工具交互,这是解决复杂现实世界问题的关键能力。随着这一能力在先进人工智能系统中日益重要,对高质量多轮训练数据以开发和优化该能力的需求不容忽视。现有的数据合成方法,如随机环境采样或多智能体角色扮演,不足以在现实环境中生成高质量数据。实际挑战体现在三个方面:目标模型训练、工具架构隔离以及多轮逻辑依赖。为应对这些结构性不足,我们提出了FunReason-MT,一种面向现实世界多轮工具使用的新型数据合成框架。FunReason-MT通过采用1)环境-API图交互以收集多样化高质量轨迹,2)高级工具-查询合成以简化复杂查询构建,以及3)引导式迭代链以生成精细的思维链(CoT),从而解决了多轮FC数据中的复杂性壁垒。在伯克利函数调用排行榜(BFCLv3)上的评估证明了我们框架的有效性:基于FunReason-MT生成数据构建的4B参数模型在同等规模模型中实现了最先进的性能,超越了大多数闭源模型。在BFCLv4上的进一步性能提升证实了FunReason-MT为智能体学习提供了可靠且鲁棒的数据源。