In this paper, we introduce a two-level attention schema, Poolingformer, for long document modeling. Its first level uses a smaller sliding window pattern to aggregate information from neighbors. Its second level employs a larger window to increase receptive fields with pooling attention to reduce both computational cost and memory consumption. We first evaluate Poolingformer on two long sequence QA tasks: the monolingual NQ and the multilingual TyDi QA. Experimental results show that Poolingformer sits atop three official leaderboards measured by F1, outperforming previous state-of-the-art models by 1.9 points (79.8 vs. 77.9) on NQ long answer, 1.9 points (79.5 vs. 77.6) on TyDi QA passage answer, and 1.6 points (67.6 vs. 66.0) on TyDi QA minimal answer. We further evaluate Poolingformer on a long sequence summarization task. Experimental results on the arXiv benchmark continue to demonstrate its superior performance.
翻译:在本文中,我们引入了长期文档建模的双层关注模式( Poolingexe) 。 第一层使用一个较小的滑动窗口模式来汇总邻居的信息。 第二层使用一个更大的窗口来增加可容纳字段,集中关注降低计算成本和内存消耗。 我们首先根据两个长序列的QA任务对集合源进行评估:单一语言的NQ和多语言的Tydi QA。 实验结果显示, 集合源位于以F1衡量的三个正式头板上, 超过以1.9点(79.8对77.9)表示的、以1.9点(79.5对77.6)表示的NQA长回答、1.9点(79.5对77.6)表示的Tydi QA通道回答,以及1.6点(67.6对66.0)表示的Tydi QA最低回答。 我们进一步评估了长序列拼凑任务。 ARXiv基准的实验结果继续显示其优异性。