Natural language models are often summarized through a high-dimensional set of descriptive metrics including training corpus size, training time, the number of trainable parameters, inference times, and evaluation statistics that assess performance across tasks. The high dimensional nature of these metrics yields challenges with regard to objectively comparing models; in particular it is challenging to assess the trade-off models make between performance and resources (compute time, memory, etc.). We apply Data Envelopment Analysis (DEA) to this problem of assessing the resource-performance trade-off. DEA is a nonparametric method that measures productive efficiency of abstract units that consume one or more inputs and yield at least one output. We recast natural language models as units suitable for DEA, and we show that DEA can be used to create an effective framework for quantifying model performance and efficiency. A central feature of DEA is that it identifies a subset of models that live on an efficient frontier of performance. DEA is also scalable, having been applied to problems with thousands of units. We report empirical results of DEA applied to 14 different language models that have a variety of architectures, and we show that DEA can be used to identify a subset of models that effectively balance resource demands against performance.
翻译:自然语言模型往往通过一套高层次的描述性指标加以总结,包括培训单元的大小、培训时间、可训练参数的数目、推算时间以及评估不同任务的业绩的评价统计数字。这些指标的高度性质在客观比较模型方面会产生挑战;特别是评估业绩和资源(计算时间、记忆等)之间的权衡模型具有挑战性。我们将数据输入分析(DEA)应用于评估资源业绩权衡的问题。DEA是一种非参数性方法,用来衡量消耗一种或多种投入并至少产生一种产出的抽象单位的生产效率。我们重新将自然语言模型作为适合DEA的单位。我们表明,DEA可以用来建立一个有效的框架来量化示范业绩和效率。DEA的一个中心特征是,它确定了生活在有效业绩前沿的一组模型。DEA也是可扩展的,已经应用于数千个单位的问题。我们报告了DEA的经验性结果应用于14个不同语言模型,这些模型有各种结构,我们表明,DEA可以用来确定一种有效的业绩模型。