Despite the great success in Natural Language Processing (NLP) area, large pre-trained language models like BERT are not well-suited for resource-constrained or real-time applications owing to the large number of parameters and slow inference speed. Recently, compressing and accelerating BERT have become important topics. By incorporating a parameter-sharing strategy, ALBERT greatly reduces the number of parameters while achieving competitive performance. Nevertheless, ALBERT still suffers from a long inference time. In this work, we propose the ELBERT, which significantly improves the average inference speed compared to ALBERT due to the proposed confidence-window based early exit mechanism, without introducing additional parameters or extra training overhead. Experimental results show that ELBERT achieves an adaptive inference speedup varying from 2$\times$ to 10$\times$ with negligible accuracy degradation compared to ALBERT on various datasets. Besides, ELBERT achieves higher accuracy than existing early exit methods used for accelerating BERT under the same computation cost. Furthermore, to understand the principle of the early exit mechanism, we also visualize the decision-making process of it in ELBERT.
翻译:尽管在自然语言处理(NLP)领域取得了巨大成功,但大型预先培训的语言模型,如BERT等,由于参数数量众多,推导速度缓慢,不适合资源限制或实时应用。最近,压缩和加速BERT已成为重要话题。ALBERT通过纳入一个参数共享战略,大大减少参数数量,同时实现竞争性性能。然而,ALBERT仍然受到长期的推论时间的影响。在这项工作中,我们提议ELBERT,由于基于信任的早期退出机制,与ALBERT相比,它大大提高了平均推论速度,而没有引入额外的参数或额外的培训间接费用。实验结果表明,ELBERT实现了适应性推论速度,从2美元到10美元不等,精确度下降幅度小于各个数据集的ALBERT。此外,ELBERT在计算成本方面比现有的加速BERETT的早期退出方法更精确。此外,为了理解早期退出机制的原则,我们还在ERERD决策过程中看到E。