On a wide range of natural language processing and information retrieval tasks, transformer-based models, particularly pre-trained language models like BERT, have demonstrated tremendous effectiveness. Due to the quadratic complexity of the self-attention mechanism, however, such models have difficulties processing long documents. Recent works dealing with this issue include truncating long documents, in which case one loses potential relevant information, segmenting them into several passages, which may lead to miss some information and high computational complexity when the number of passages is large, or modifying the self-attention mechanism to make it sparser as in sparse-attention models, at the risk again of missing some information. We follow here a slightly different approach in which one first selects key blocks of a long document by local query-block pre-ranking, and then few blocks are aggregated to form a short document that can be processed by a model such as BERT. Experiments conducted on standard Information Retrieval datasets demonstrate the effectiveness of the proposed approach.
翻译:在一系列广泛的自然语言处理和信息检索任务方面,基于变压器的模型,特别是诸如BERT等经过预先训练的语言模型,已经表现出巨大的效力。然而,由于自留机制的四面形复杂性,这类模型难以处理长的文件。最近处理该问题的工作包括缩短长的文件,在这种情况下,一个人失去潜在的相关信息,将其分成几个段落,如果通道数量大,可能导致丢失某些信息和高计算复杂性,或者修改自留机制,使其与稀释模式一样稀释,从而有可能再次丢失一些信息。我们在此采取一种略微不同的办法,先由当地查询区预排队选择长文件的关键块,然后将几个区块汇总成一个短的文件,由诸如BERT等模型处理。在标准信息检索数据库中进行的实验显示了拟议方法的有效性。