The multi-head self-attention of popular transformer models is widely used within Natural Language Processing (NLP), including for the task of extractive summarization. With the goal of analyzing and pruning the parameter-heavy self-attention mechanism, there are multiple approaches proposing more parameter-light self-attention alternatives. In this paper, we present a novel parameter-lean self-attention mechanism using discourse priors. Our new tree self-attention is based on document-level discourse information, extending the recently proposed "Synthesizer" framework with another lightweight alternative. We show empirical results that our tree self-attention approach achieves competitive ROUGE-scores on the task of extractive summarization. When compared to the original single-head transformer model, the tree attention approach reaches similar performance on both, EDU and sentence level, despite the significant reduction of parameters in the attention component. We further significantly outperform the 8-head transformer model on sentence level when applying a more balanced hyper-parameter setting, requiring an order of magnitude less parameters.
翻译:在自然语言处理(NLP)中,广泛使用多头自控流行变压器模型的多头人自控模式,包括用于采掘总称任务。为了分析和运行参数重自控机制,有多种方法提出更多参数光自控替代方法。在本文中,我们用讨论前述内容提出了一个新的参数利昂自控机制。我们的新树自控基于文件层面的讨论信息,扩大了最近提议的“合成器”框架,并增加了另一种轻量级替代方法。我们展示了经验结果,即我们的树自留法在采掘总称任务上取得了竞争性的ROUGE核心。与最初的单头变压器模型相比,树木自控方法在注意部分的参数显著减少的情况下,在EDU和句一级都取得了类似的效果。我们应用更平衡的超分量度设置时,需要数量更低的参数,我们进一步大大超越了在句级上的8头变压器模型。