In natural language processing (NLP), the context of a word or sentence plays an essential role. Contextual information such as the semantic representation of a passage or historical dialogue forms an essential part of a conversation and a precise understanding of the present phrase or sentence. However, the standard attention mechanisms typically generate weights using query and key but ignore context, forming a Bi-Attention framework, despite their great success in modeling sequence alignment. This Bi-Attention mechanism does not explicitly model the interactions between the contexts, queries and keys of target sequences, missing important contextual information and resulting in poor attention performance. Accordingly, a novel and general triple-attention (Tri-Attention) framework expands the standard Bi-Attention mechanism and explicitly interacts query, key, and context by incorporating context as the third dimension in calculating relevance scores. Four variants of Tri-Attention are generated by expanding the two-dimensional vector-based additive, dot-product, scaled dot-product, and bilinear operations in Bi-Attention to the tensor operations for Tri-Attention. Extensive experiments on three NLP tasks demonstrate that Tri-Attention outperforms about 30 state-of-the-art non-attention, standard Bi-Attention, contextual Bi-Attention approaches and pretrained neural language models1.
翻译:在自然语言处理(NLP)中,单词或句语的语义处理(NLP)中,一个词或句的上下文起着关键作用,诸如一段或历史对话的语义表述等背景信息构成谈话和准确理解当前短语或句子的一个基本部分,但是,标准关注机制通常会利用查询和关键但忽视背景产生加权,形成双向关注框架,尽管它们在模式序列调整方面非常成功。这个双向关注机制没有明确地模拟目标序列的背景、查询和关键、重要背景信息缺失和导致关注性差的相互作用。因此,一个新的和一般的三重注意(三重注意)框架扩大了标准双向关注机制,并通过将上下文作为计算相关性分数的第三个方面来明确互动询问、关键和背景。三重注意前置的四种变式是通过扩大基于二维矢量的添加剂、 dot产品、缩放的点产品和双线操作来生成的。 双向三重注意的三重注意、三重注意模式和三重方向的三分母对等任务。