BERT has shown a lot of sucess in a wide variety of NLP tasks. But it has a limitation dealing with long inputs due to its attention mechanism. Longformer, ETC and BigBird addressed this issue and effectively solved the quadratic dependency problem. However we find that these models are not sufficient, and propose LittleBird, a novel model based on BigBird with improved speed and memory footprint while maintaining accuracy. In particular, we devise a more flexible and efficient position representation method based on Attention with Linear Biases (ALiBi). We also show that replacing the method of global information represented in the BigBird with pack and unpack attention is more effective. The proposed model can work on long inputs even after being pre-trained on short inputs, and can be trained efficiently reusing existing pre-trained language model for short inputs. This is a significant benefit for low-resource languages where large amounts of long text data are difficult to obtain. As a result, our experiments show that LittleBird works very well in a variety of languages, achieving high performance in question answering tasks, particularly in KorQuAD2.0, Korean Question Answering Dataset for long paragraphs.
翻译:BERT在各种各样的NLP任务中展示了大量的精选。 但是,由于它的注意机制,它对于长期投入有一定的限制。 长者、 ETC 和 BigBird 等公司处理过这个问题,有效地解决了四重依赖问题。 但我们发现,这些模型不够充分,并提议了以大伯德为基础、速度和记忆足迹提高并同时保持准确性的新模式LittilBird。 特别是,我们根据对Linearal Biases(ALiBi)的注意,设计了一个更灵活、更高效的定位代表方法。 我们还表明,用包装和解开的注意取代大伯尔德(BigBird)中代表全球信息的方法比较有效。 拟议的模型可以在经过短期投入培训后对长期投入进行长期投入进行操作, 并且能够经过培训, 有效地重新使用现有的经过培训的语言模式进行短期投入。 这对难以获得大量长文本数据的低资源语言有很大的好处。 我们的实验表明, 小伯尔德在多种语言中都很好地工作,在解答问题的任务中取得了很高的成绩,,特别是在 KorQUAD2., 韩国问题解析数据为长段落。