BERT has shown a lot of sucess in a wide variety of NLP tasks. But it has a limitation dealing with long inputs due to its attention mechanism. Longformer, ETC and BigBird addressed this issue and effectively solved the quadratic dependency problem. However we find that these models are not sufficient, and propose LittleBird, a novel model based on BigBird with improved speed and memory footprint while maintaining accuracy. In particular, we devise a more flexible and efficient position representation method based on Attention with Linear Biases (ALiBi). We also show that replacing the method of global information represented in the BigBird with pack and unpack attention is more effective. The proposed model can work on long inputs even after being pre-trained on short inputs, and can be trained efficiently reusing existing pre-trained language model for short inputs. This is a significant benefit for low-resource languages where large amounts of long text data are difficult to obtain. As a result, our experiments show that LittleBird works very well in a variety of languages, achieving high performance in question answering tasks, particularly in KorQuAD2.0, Korean Question Answering Dataset for long paragraphs.
翻译:BERT已经在各种NLP任务中取得了很大的成功。但由于其注意机制,它在处理长输入时存在局限性。Longformer、ETC和BigBird解决了这个问题并有效地解决了二次依赖性问题。然而,我们发现这些模型还不够,因此提出了LittleBird,这是一种基于BigBird的新型模型,具有更高的速度和更小的内存占用,同时保持准确性。特别地,我们设计了一种更灵活和高效的位置表示方法,基于带有线性偏差的Attention(ALiBi)。我们还证明,用pack和unpack attention替换BigBird中的全局信息表示方法更为有效。所提出的模型可以在短输入上进行预训练,然后在长输入上进行工作,并且可以高效地重复使用现有的短输入预训练语言模型进行训练。这对于难以获得大量长文本数据的低资源语言来说是一个重要的优势。因此,我们的实验表明,LittleBird在各种语言中都表现出色,在问答任务中取得了高性能,尤其是在长段落的韩国问答数据集KorQuAD2.0中。