We present Natural Language Tools (NLT), a framework that replaces programmatic JSON tool calling in large language models (LLMs) with natural language outputs. By decoupling tool selection from response generation, NLT eliminates task interference and format constraints that degrade tool call performance. When evaluated across 10 models and 6,400 trials spanning customer service and mental health domains, NLT improves tool calling accuracy by 18.4 percentage points while reducing output variance by 70%. Open-weight models see the largest gains, surpassing flagship closed-weight alternatives, with implications for model training in both reinforcement learning and supervised fine-tuning stages. These improvements persist under prompt perturbations and extend tool-calling capabilities to models lacking native support.
翻译:我们提出了自然语言工具(NLT)框架,该框架使用自然语言输出替代大型语言模型(LLM)中基于JSON的程序化工具调用方式。通过将工具选择与响应生成解耦,NLT消除了损害工具调用性能的任务干扰与格式约束。在涵盖客户服务和心理健康领域的10个模型及6,400次试验评估中,NLT将工具调用准确率提升了18.4个百分点,同时将输出方差降低了70%。开源模型获得了最大增益,其表现超越了旗舰级闭源模型,这对强化学习和监督微调两个阶段的模型训练均具有启示意义。这些改进在提示扰动下保持稳定,并将工具调用能力扩展至缺乏原生支持的模型。