We present BARTpho with two versions -- BARTpho_word and BARTpho_syllable -- the first public large-scale monolingual sequence-to-sequence models pre-trained for Vietnamese. Our BARTpho uses the "large" architecture and pre-training scheme of the sequence-to-sequence denoising model BART, thus especially suitable for generative NLP tasks. Experiments on a downstream task of Vietnamese text summarization show that in both automatic and human evaluations, our BARTpho outperforms the strong baseline mBART and improves the state-of-the-art. We release BARTpho to facilitate future research and applications of generative Vietnamese NLP tasks. Our BARTpho models are available at: https://github.com/VinAIResearch/BARTpho
翻译:我们用两个版本 -- -- BARTFO_word和BARTFO_syllable -- -- 这是第一个为越南人预先培训的大规模公共大型单语单语序列到序列模型,我们的BARTFO使用序列到序列取消模型BART的“大”结构和培训前计划,因此特别适合基因化NLP任务。关于越南文本总结的下游任务的实验表明,在自动和人文评估中,我们的BARTFO都超越了强大的基线MBART,改进了最新技术。我们释放BARTFO,以便利未来研究和应用具有基因的越南NLP任务。我们的BARTF模型可以在https://github.com/VinAIresearch/BARTPHO上查阅。