将 BERT 纳入与适应器平行序列代碼 (Incorporating BERT into Parallel Sequence Decoding with Adapters)

While large scale pre-trained language models such as BERT have achieved great success on various natural language understanding tasks, how to efficiently and effectively incorporate them into sequence-to-sequence models and the corresponding text generation tasks remains a non-trivial problem. In this paper, we propose to address this problem by taking two different BERT models as the encoder and decoder respectively, and fine-tuning them by introducing simple and lightweight adapter modules, which are inserted between BERT layers and tuned on the task-specific dataset. In this way, we obtain a flexible and efficient model which is able to jointly leverage the information contained in the source-side and target-side BERT models, while bypassing the catastrophic forgetting problem. Each component in the framework can be considered as a plug-in unit, making the framework flexible and task agnostic. Our framework is based on a parallel sequence decoding algorithm named Mask-Predict considering the bi-directional and conditional independent nature of BERT, and can be adapted to traditional autoregressive decoding easily. We conduct extensive experiments on neural machine translation tasks where the proposed method consistently outperforms autoregressive baselines while reducing the inference latency by half, and achieves $36.49$/$33.57$ BLEU scores on IWSLT14 German-English/WMT14 German-English translation. When adapted to autoregressive decoding, the proposed method achieves $30.60$/$43.56$ BLEU scores on WMT14 English-German/English-French translation, on par with the state-of-the-art baseline models.

翻译：虽然诸如BERT等大规模预先培训的语言模型在各种自然语言理解任务方面取得了巨大成功,但如何高效率和高效地将其纳入源端和目标端的BERT模型和相应的文本生成任务仍是一个非三重问题。在本文件中,我们提议采用两种不同的BERT模型分别作为编码器和解码器,并通过引入简单和轻量的调试器模块对其进行微调,这些模块在BERT层之间插入,并按具体任务数据集调整。通过这种方式,我们获得一种灵活和高效的模式,能够共同利用源端和目标端BERT模型中的信息,同时避免灾难性的遗忘问题。在这个框架中,每个组成部分都可以被视为一个插件,使框架具有灵活性和任务性。我们的框架以一个平行的解码算法为基础,名为Mask-Predicitt,考虑到BERT的双向和有条件的美元独立性质,并且可以很容易地适应传统的自动解析。我们用Neural-EU-ral-rational-rational-deal dal le lex redustrational-deal-deal-defal-deal-deal-deal-deal-deal-deal-lexal-legal-legal disal disal dislislisal dislisal disal disal disal disal disal disal disal disal disal disal 方法在拟议方法下,同时实现了B-dexxxxxxxxxxxxxxx制制制制制制制制制制制制方法,同时制制制制制制制制制制制制制制制制方法,同时制制制制方法,并制制制制制制制制制制制。