Language models (LMs) exhibit remarkable abilities to solve new tasks from just a few examples or textual instructions, especially at scale. They also, paradoxically, struggle with basic functionality, such as arithmetic or factual lookup, where much simpler and smaller models excel. In this paper, we show that LMs can teach themselves to use external tools via simple APIs and achieve the best of both worlds. We introduce Toolformer, a model trained to decide which APIs to call, when to call them, what arguments to pass, and how to best incorporate the results into future token prediction. This is done in a self-supervised way, requiring nothing more than a handful of demonstrations for each API. We incorporate a range of tools, including a calculator, a Q\&A system, two different search engines, a translation system, and a calendar. Toolformer achieves substantially improved zero-shot performance across a variety of downstream tasks, often competitive with much larger models, without sacrificing its core language modeling abilities.
翻译:语言模型(LMS) 展示出从几个例子或文本指令中解决新任务的巨大能力, 特别是在规模上。 矛盾的是, 语言模型(LMS) 展示出解决新任务的独特能力, 特别是从几个例子或文本指令中解决新任务的能力。 语言模型( LMS) 也与基本功能, 如算术或事实调查, 比较简单和小的模型非常优秀。 在本文中, 我们显示LMS 可以教自己通过简单的 API 来使用外部工具, 并实现两个最佳的世界。 我们引入了工具Ulorex, 这是一种经过培训的模型, 以决定哪些 API 何时会叫它们, 要通过哪些参数, 以及如何将结果最佳地纳入未来的象征性预测中。 这是以自我监督的方式完成的, 只需要对每个 API 进行少量的演示。 我们包含一系列工具, 包括一个计算器、 QAA系统、 两个不同的搜索引擎、 一个翻译系统和一个日历。 工具模型在一系列下游下游任务中取得了显著的零点的性表现,, 通常与大得多的模型竞争,, 而不牺牲其核心语言建模能力 。