Prompting is now the primary way to utilize the multitask capabilities of language models (LMs), but prompts occupy valuable space in the input context window, and re-encoding the same prompt is computationally inefficient. Finetuning and distillation methods allow for specialization of LMs without prompting, but require retraining the model for each task. To avoid this trade-off entirely, we present gisting, which trains an LM to compress prompts into smaller sets of "gist" tokens which can be reused for compute efficiency. Gist models can be easily trained as part of instruction finetuning via a restricted attention mask that encourages prompt compression. On decoder (LLaMA-7B) and encoder-decoder (FLAN-T5-XXL) LMs, gisting enables up to 26x compression of prompts, resulting in up to 40% FLOPs reductions, 4.2% wall time speedups, storage savings, and minimal loss in output quality.
翻译:引导是利用语言模型(LM)的多任务能力的主要方法,但要点占据了输入上下文窗口中宝贵的空间,而重新编码相同的要点在计算上是低效的。微调和蒸馏方法允许LM定制化而不需要引导,但需要为每个任务重新训练模型。为了完全避免这种权衡,我们提出了要点标记压缩,它训练LM将要点压缩为更小的“要点标记”集合,可以重复使用以提高计算效率。通过鼓励要点压缩的受限关注屏蔽,可以在指令微调的一部分轻松训练Gist模型。在解码器(LLaMA-7B)和编码器-解码器(FLAN-T5-XXL)LM上,要点标记压缩使要点最多压缩了26倍,导致了高达40%的FLOPs减少、4.2%的墙时速度提高、存储节省和输出质量最小的损失。