The multi-head self-attention mechanism of the transformer model has been thoroughly investigated recently. In one vein of study, researchers are interested in understanding why and how transformers work. In another vein, researchers propose new attention augmentation methods to make transformers more accurate, efficient and interpretable. In this paper, we combine these two lines of research in a human-in-the-loop pipeline to first discover important task-specific attention patterns. Then those patterns are injected, not only to smaller models, but also to the original model. The benefits of our pipeline and discovered patterns are demonstrated in two case studies with extractive summarization and topic segmentation. After discovering interpretable patterns in BERT-based models fine-tuned for the two downstream tasks, experiments indicate that when we inject the patterns into attention heads, the models show considerable improvements in accuracy and efficiency.
翻译:最近对变压器模型的多头自留机制进行了彻底调查。在一项研究中,研究人员对理解变压器为何和如何运作感兴趣。在另一项研究中,研究人员提出新的关注增强方法,以使变压器更加准确、高效和可解释。在本文件中,我们将这两条研究线结合到“人行中”管道中,以首先发现重要的任务特有关注模式。然后,这些模式不仅被注入到较小的模型中,而且被注入到原始模型中。我们的输油管和发现模式的效益在采掘总和专题分割的两个案例研究中得到了证明。在发现基于BERT模型的可解释模式中,对两个下游任务进行了微调后,实验表明当我们将这些模式注入到关注对象时,这些模式在准确性和效率方面都有相当大的改进。