A benchmark provides an ecosystem to measure the advancement of models with standard datasets and automatic and human evaluation metrics. We introduce IndoNLG, the first such benchmark for the Indonesian language for natural language generation (NLG). It covers six tasks: summarization, question answering, open chitchat, as well as three different language-pairs of machine translation tasks. We provide a vast and clean pre-training corpus of Indonesian, Sundanese, and Javanese datasets called Indo4B-Plus, which is used to train our pre-trained NLG model, IndoBART. We evaluate the effectiveness and efficiency of IndoBART by conducting extensive evaluation on all IndoNLG tasks. Our findings show that IndoBART achieves competitive performance on Indonesian tasks with five times fewer parameters compared to the largest multilingual model in our benchmark, mBART-LARGE (Liu et al., 2020), and an almost 4x and 2.5x faster inference time on the CPU and GPU respectively. We additionally demonstrate the ability of IndoBART to learn Javanese and Sundanese, and it achieves decent performance on machine translation tasks.
翻译:我们采用了IndoNLG,这是印度尼西亚语言在自然语言生成方面的第一个此类基准。我们通过对IndoNLG的所有任务进行广泛评估,评估了IndoBART的效力和效率。我们的调查结果显示,IndoBART在印度尼西亚的任务上取得了竞争性业绩,其参数比我们基准中最大的多语种模式MBARGE(Liu等人,2020年)要少五倍,比我们基准中最大的多语种模式MBARTE(Liu等人,2020年)还要快近4x和2.5x。我们还展示了IndoBART在学习Javanes和Sundanesa的正统和正统任务方面完成的能力。