批次提示:与大语言示范APP的有效推断 (Batch Prompting: Efficient Inference with Large Language Model APIs) - 专知论文

会员服务 ·

0

推断 · 语言模型化 · 可约的 · Performance · Prompt ·

2023 年 1 月 19 日

Batch Prompting: Efficient Inference with Large Language Model APIs

翻译：批次提示:与大语言示范APP的有效推断

Zhoujun Cheng,Jungo Kasai,Tao Yu

from arxiv, 18 pages, 9 figures

Performing inference on hundreds of thousands of samples with large language models (LLMs) can be computationally and financially costly. We propose batch prompting, a simple alternative prompting approach that enables the LLM to run inference in batches, instead of one sample at a time. Our method reduces both token and time costs while retaining downstream performance. We theoretically demonstrate that under a few-shot in-context learning setting, the inference costs decrease almost inverse linearly with the number of samples in each batch. We extensively validate the effectiveness of batch prompting on ten datasets across commonsense QA, arithmetic reasoning, and NLI/NLU: batch prompting significantly~(up to $5\times$ with six samples in batch) reduces the LLM (Codex) inference token and time costs while achieving better or comparable performance. Our analysis shows that the number of samples in each batch and the complexity of tasks affect its performance. Further, batch prompting can be applied across different LLMs and reasoning methods.

翻译：对具有大语言模型的数十万个样本进行推断,可以计算成本,而且费用也很高。我们建议批量催化,这是一种简单的替代催化方法,使LLM能够分批进行推断,而不是一次进行一次抽样。我们的方法在保持下游性能的同时降低了象征性和时间成本。我们理论上表明,在几张文本学习环境中,推论成本随着每批样本的数量而几乎线性地下降。我们广泛验证了批量在常见QA、算术推理和NLI/NLU的十套数据中进行催化的有效性:批量大量催化~(最多5美元,每批有6个样本)降低LLMM(Codex)推论和时间成本,同时实现更好或可比的性能。我们的分析表明,每批样本的数量和任务的复杂性会影响其性能。此外,批量的催化可适用于不同的LMS和推理方法。

0

相关内容

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

324+阅读 · 2020年11月26日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

MaPIP1;1介导香蕉响应干旱胁迫的分子调控机制解析

国家自然科学基金

0+阅读 · 2015年12月31日

卟啉酞菁类分子光伏材料的设计合成和性能研究

国家自然科学基金

0+阅读 · 2014年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

流域极端水文事件对气候变化的多维响应及预估研究

国家自然科学基金

0+阅读 · 2013年12月31日

东北黑土地区农业生产人为源气溶胶排放模型研究

国家自然科学基金

0+阅读 · 2012年12月31日

SDIR1互作蛋白ECA1在植物应对干旱胁迫过程中的功能分析

国家自然科学基金

0+阅读 · 2012年12月31日

适应气候变化的多尺度农艺景观格局调控及模拟研究

国家自然科学基金

0+阅读 · 2012年12月31日

Witten Laplacian的特征值及与其相关的Ricci Soliton研究

国家自然科学基金

0+阅读 · 2012年12月31日

典型矿区生态产业发展模式研究

国家自然科学基金

2+阅读 · 2009年12月31日

sRAGE对缺血/再灌注的心脏保护作用及其机制

国家自然科学基金

0+阅读 · 2008年12月31日

Adapting Offline Speech Translation Models for Streaming with Future-Aware Distillation and Inference

Arxiv

0+阅读 · 2023年3月14日

Efficient Speech Translation with Dynamic Latent Perceivers

Arxiv

0+阅读 · 2023年3月14日

Lite DETR : An Interleaved Multi-Scale Encoder for Efficient DETR

Arxiv

0+阅读 · 2023年3月13日

Adaptive Machine Translation with Large Language Models

Arxiv

0+阅读 · 2023年3月13日

High-throughput Generative Inference of Large Language Models with a Single GPU

Arxiv

0+阅读 · 2023年3月13日

Self-planning Code Generation with Large Language Model

Arxiv

0+阅读 · 2023年3月12日

On the Feasibility of Specialized Ability Stealing for Large Language Code Models

Arxiv

0+阅读 · 2023年3月10日

Nested Hierarchical Transformer: Towards Accurate, Data-Efficient and Interpretable Visual Understanding

Arxiv

12+阅读 · 2021年12月30日

Making Pre-trained Language Models Better Few-shot Learners

Arxiv

14+阅读 · 2020年12月31日

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Arxiv

12+阅读 · 2020年6月23日

VIP会员

文章信息

相关主题

语言模型化

相关VIP内容

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

324+阅读 · 2020年11月26日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【NeurIPS2025】VideoLucy：用于长视频理解的深度记忆回溯机制

不确定环境下无人机与无人地面车辆编队的地下勘探规划算法 | 122页

【NTU博士论文】端到端鲁棒自动语音识别的最新进展

用于强化学习的扩散模型：基础、分类与发展

相关资讯

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

相关论文

Adapting Offline Speech Translation Models for Streaming with Future-Aware Distillation and Inference

Arxiv

0+阅读 · 2023年3月14日

Efficient Speech Translation with Dynamic Latent Perceivers

Arxiv

0+阅读 · 2023年3月14日

Lite DETR : An Interleaved Multi-Scale Encoder for Efficient DETR

Arxiv

0+阅读 · 2023年3月13日

Adaptive Machine Translation with Large Language Models

Arxiv

0+阅读 · 2023年3月13日

High-throughput Generative Inference of Large Language Models with a Single GPU

Arxiv

0+阅读 · 2023年3月13日

Self-planning Code Generation with Large Language Model

Arxiv

0+阅读 · 2023年3月12日

On the Feasibility of Specialized Ability Stealing for Large Language Code Models

Arxiv

0+阅读 · 2023年3月10日

Nested Hierarchical Transformer: Towards Accurate, Data-Efficient and Interpretable Visual Understanding

Arxiv

12+阅读 · 2021年12月30日

Making Pre-trained Language Models Better Few-shot Learners

Arxiv

14+阅读 · 2020年12月31日

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Arxiv

12+阅读 · 2020年6月23日

相关基金

MaPIP1;1介导香蕉响应干旱胁迫的分子调控机制解析

国家自然科学基金

0+阅读 · 2015年12月31日

卟啉酞菁类分子光伏材料的设计合成和性能研究

国家自然科学基金

0+阅读 · 2014年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

流域极端水文事件对气候变化的多维响应及预估研究

国家自然科学基金

0+阅读 · 2013年12月31日

东北黑土地区农业生产人为源气溶胶排放模型研究

国家自然科学基金

0+阅读 · 2012年12月31日

SDIR1互作蛋白ECA1在植物应对干旱胁迫过程中的功能分析

国家自然科学基金

0+阅读 · 2012年12月31日

适应气候变化的多尺度农艺景观格局调控及模拟研究

国家自然科学基金

0+阅读 · 2012年12月31日

Witten Laplacian的特征值及与其相关的Ricci Soliton研究

国家自然科学基金

0+阅读 · 2012年12月31日

典型矿区生态产业发展模式研究

国家自然科学基金

2+阅读 · 2009年12月31日

sRAGE对缺血/再灌注的心脏保护作用及其机制

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员