We present BERTGEN, a novel generative, decoder-only model which extends BERT by fusing multimodal and multilingual pretrained models VL-BERT and M-BERT, respectively. BERTGEN is auto-regressively trained for language generation tasks, namely image captioning, machine translation and multimodal machine translation, under a multitask setting. With a comprehensive set of evaluations, we show that BERTGEN outperforms many strong baselines across the tasks explored. We also show BERTGEN's ability for zero-shot language generation, where it exhibits competitive performance to supervised counterparts. Finally, we conduct ablation studies which demonstrate that BERTGEN substantially benefits from multi-tasking and effectively transfers relevant inductive biases from the pre-trained models.
翻译:我们介绍了BERTGEN,这是一个新型的基因化、解码器单一模型,它通过分别使用多式和多语种预先培训的VL-BERT模型和M-BERT模型来扩展BERT。BERTGEN是一个在多任务环境下,在语言生成任务,即图像字幕、机器翻译和多式机器翻译方面接受自动递增培训的模型。通过一套全面的评估,我们表明BERTGEN在所探讨的任务中有许多强有力的基线。我们还展示了BERTGEN在零发语言生成方面的能力,它在那里向受监督的对应方展示了竞争性的性能。最后,我们进行了通缩研究,表明BERTGEN从多任务和有效转移预先培训模式的相关诱导偏差中大有裨益。