DACT-BERT: 高效BERT推断的可区别的适应性计算时间 (DACT-BERT: Differentiable Adaptive Computation Time for an Efficient BERT Inference)

Large-scale pre-trained language models have shown remarkable results in diverse NLP applications. Unfortunately, these performance gains have been accompanied by a significant increase in computation time and model size, stressing the need to develop new or complementary strategies to increase the efficiency of these models. In this paper we propose DACT-BERT, a differentiable adaptive computation time strategy for BERT-like models. DACT-BERT adds an adaptive computational mechanism to BERT's regular processing pipeline, which controls the number of Transformer blocks that need to be executed at inference time. By doing this, the model learns to combine the most appropriate intermediate representations for the task at hand. Our experiments demonstrate that our approach, when compared to the baselines, excels on a reduced computational regime and is competitive in other less restrictive ones.

翻译：不幸的是,这些绩效收益伴随着计算时间和模型规模的大幅增加,强调需要制定新的或补充性战略来提高这些模型的效率。在本文件中,我们提议了DACT-BERT,这是类似于BERT的模型的一种不同的适应性计算时间战略。DACT-BERT在BERT的常规处理管道中增加了一个适应性计算机制,它控制了需要在推论时间执行的变换方块的数量。通过这样做,该模型学会了将手头任务最合适的中间代表组合在一起。我们的实验表明,我们的方法与基线相比,优于减少的计算制度,在其他限制性较小的模式中具有竞争力。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/