Modern language models leverage increasingly large numbers of parameters to achieve performance on natural language understanding tasks. Ensembling these models in specific configurations for downstream tasks show even further performance improvements. In this paper, we perform an analysis of bagging language models and compare single language models to bagged ensembles that are roughly equivalent in terms of final model size. We explore an array of model bagging configurations for natural language understanding tasks with final ensemble sizes ranging from 300M parameters to 1.5B parameters and determine that our ensembling methods are at best roughly equivalent to single LM baselines. We note other positive effects of bagging and pruning in specific scenarios according to findings in our experiments such as variance reduction and minor performance improvements.
翻译:现代语言模型利用越来越多的参数来实现自然语言理解任务的绩效。将这些模型纳入下游任务的具体配置中,显示了进一步的绩效改进。在本文件中,我们分析了包装语言模型,并将单一语言模型与在最终模型大小方面大致相当的包装组合作比较。我们探索了一系列天然语言理解任务的模型包装配置,其最终组合大小从300M参数到1.5B参数不等,并确定我们的组合方法至多大致相当于单一LM基线。我们注意到根据我们实验的结果,例如差异减少和微小性能改进,在特定情况下加装和排出的其他积极效果。