Large Language Models (LLMs) like GPT-3 have sparked significant interest in their generative capabilities, leading to the development of various commercial applications. The high cost of using the models drives application builders to maximize the value of generation under a limited inference budget. This paper presents a study of optimizing inference hyperparameters like the number of responses, temperature and max tokens, which significantly affects the utility/cost of text generation. We design a framework named EcoOptiGen which leverages economical hyperparameter optimization and cost-based pruning. Experiments with the latest GPT-3.5 models on a variety of tasks verify its effectiveness. EcoOptiGen is implemented in the FLAML library: https://github.com/microsoft/FLAML, and we provide one example of using it at: https://microsoft.github.io/FLAML/docs/Examples/Integrate%20-%20OpenAI.
翻译:GPT-3等大型语言模型(LLMs)已引起人们对其遗传能力的巨大兴趣,导致各种商业应用的开发。使用这些模型的高昂成本驱动了应用模型建设者,以便在有限的推论预算下最大限度地增加发电价值。本文介绍了关于优化超参数的研究,例如反应、温度和最大符号的数量,这严重影响了文本生成的效用/成本。我们设计了一个名为EcoOptiGen的框架,它利用经济超光度优化和成本调整。与最新的GPT-3.5模型进行的各种任务实验,证实了其有效性。EcooptiGen在FLAML图书馆实施:https://github.com/microcrosoft/FLAML,我们提供了使用它的一个实例:https://microcroft.github.io/FLAMLAML/docs/Examples/Integrate%20-20OnopenAI。</s>