Pre-trained BERT models have achieved impressive accuracy on natural language processing (NLP) tasks. However, their excessive amount of parameters hinders them from efficient deployment on edge devices. Binarization of the BERT models can significantly alleviate this issue but comes with a severe accuracy drop compared with their full-precision counterparts. In this paper, we propose an efficient and robust binary ensemble BERT (BEBERT) to bridge the accuracy gap. To the best of our knowledge, this is the first work employing ensemble techniques on binary BERTs, yielding BEBERT, which achieves superior accuracy while retaining computational efficiency. Furthermore, we remove the knowledge distillation procedures during ensemble to speed up the training process without compromising accuracy. Experimental results on the GLUE benchmark show that the proposed BEBERT significantly outperforms the existing binary BERT models in accuracy and robustness with a 2x speedup on training time. Moreover, our BEBERT has only a negligible accuracy loss of 0.3% compared to the full-precision baseline while saving 15x and 13x in FLOPs and model size, respectively. In addition, BEBERT also outperforms other compressed BERTs in accuracy by up to 6.7%.
翻译:事先经过培训的BERT模型在自然语言处理(NLP)任务方面实现了令人印象深刻的准确性,然而,这些模型的过多参数妨碍了它们在边缘设备上的有效部署。BERT模型的发明可以大大缓解这一问题,但与完全精准的模型相比,该模型的精确性会大大下降。在本文件中,我们建议采用高效和稳健的二进制混合模型(BEBERT)来弥合准确性差距。据我们所知,这是在二进制BERT中采用联合技术的首次工作,产生BEBERT, 从而在保持计算效率的同时实现更高的准确性。此外,我们删除了联合培训过程中的知识蒸馏程序,以加快培训进程,同时又不损害准确性。GLUE基准的实验结果表明,拟议的BEBERT模型在准确性和稳健性地大大超越了现有的二进制BERET模型,在培训时间上加快了2x的速度。此外,我们的BEBERERER在FLOP和FRFR格式中也分别保存了15x和13x的准确性。