Transformers are among the state of the art for many tasks in speech, vision, and natural language processing, among others. Self-attentions, which are crucial contributors to this performance have quadratic computational complexity, which makes training on longer input sequences challenging. Prior work has produced state-of-the-art transformer variants with linear attention, however, current models sacrifice performance to achieve efficient implementations. In this work, we develop a novel linear transformer by examining the properties of the key-query product within self-attentions. Our model outperforms state of the art approaches on speech recognition and speech summarization, resulting in 1 % absolute WER improvement on the Librispeech-100 speech recognition benchmark and a new INTERVIEW speech recognition benchmark, and 5 points on ROUGE for summarization with How2.
翻译:在语言、视觉和自然语言处理等许多任务中,变异器都是最先进的。 自我意识是这种表现的关键贡献者,它具有四进制复杂性,使得关于较长输入序列的培训具有挑战性。 先前的工作产生了具有线性关注的先进变异器,然而,目前的模型牺牲了最新变异器,以实现有效的实施。 在这项工作中,我们开发了一个新的线性变异器,通过审查关键查询产品在自我意识中的特性。 我们的模型优于语音识别和语音拼凑的先进方式,导致Librispeech-100语音识别基准和INTEVEW新语音识别基准的绝对WER改进了1%,以及ROUGE的5点,用于与“如何”相加。