The new generation of pre-trained NLP models push the SOTA to the new limits, but at the cost of computational resources, to the point that their use in real production environments is often prohibitively expensive. We tackle this problem by evaluating not only the standard quality metrics on downstream tasks but also the memory footprint and inference time. We present MOROCCO, a framework to compare language models compatible with \texttt{jiant} environment which supports over 50 NLU tasks, including SuperGLUE benchmark and multiple probing suites. We demonstrate its applicability for two GLUE-like suites in different languages.
翻译:新一代经过预先培训的NLP模型将SOTA推向新的极限,但以计算资源成本为代价,使得SOTA在实际生产环境中的使用往往过于昂贵。 我们不仅通过评估下游任务的标准质量衡量标准,而且通过评估记忆足迹和推算时间来解决这个问题。 我们介绍了MOROCO, 这个框架用来比较与\ textt{jiant} 环境兼容的语言模型,它支持50多项NLU任务,包括超级GLUE基准和多个测试套件。 我们展示了它对两种不同语言的GLUE式套件的适用性。