In this work, we extensively redesign the newly introduced method of token mixing using Fourier Transforms (FNET) to replace the computationally expensive self-attention mechanism in a full transformer implementation on a long document summarization task (> 512 tokens). As a baseline, we also carried out long document summarization using established methods such as Longformer and Big Bird transformer models that are capable of processing over 8000 tokens and are currently the state of the art methods for these type of problems. The original FNET paper implemented this in an encoder only architecture while abstractive summarization requires both an encoder and a decoder. Since such a pretrained transformer model does not currently exist in the public domain, we decided to implement a full transformer based on this Fourier token mixing approach in an encoder/decoder architecture which we trained starting with Glove embeddings for the individual words in the corpus. We investigated a number of different extensions to the original FNET architecture and evaluated them on their Rouge F1-score performance on a summarization task. All modifications showed better performance on the summarization task than when using the original FNET encoder in a transformer architecture.
翻译:在这项工作中,我们广泛重新设计了新引入的象征性混合方法,即使用Fourier变形器(FNET)来取代计算成本昂贵的自我注意机制,在全变压器实施长文档总和任务( > 512个质证)中取代计算成本昂贵的自我注意机制。作为基线,我们还使用长期和大鸟变形器模型等既定方法进行了长文件总和,这些模型能够处理8000个物证,目前是处理这类问题的最新方法。最初的FNET纸在一个编码器结构中实施,而抽象总和要求同时使用编码器和解码器。由于这种预先训练过的变形器模型目前并不存在于公共领域,因此我们决定采用一种完全变形器,以这种四重物混合方法为基础,在编码器/变形器结构中,我们先从Glove嵌入单词库中进行训练。我们调查了FNET原始结构的一些不同的扩展,并评估了其在一次总和任务上的红色F1核心性表现。所有修改都表明,在总和变形结构中使用原始的变形结构中,比使用FNET的变形结构更好地表现。