In NLP, convolutional neural networks (CNNs) have benefited less than recurrent neural networks (RNNs) from attention mechanisms. We hypothesize that this is because the attention in CNNs has been mainly implemented as attentive pooling (i.e., it is applied to pooling) rather than as attentive convolution (i.e., it is integrated into convolution). Convolution is the differentiator of CNNs in that it can powerfully model the higher-level representation of a word by taking into account its local fixed-size context in the input text t^x. In this work, we propose an attentive convolution network, ATTCONV. It extends the context scope of the convolution operation, deriving higher-level features for a word not only from local context, but also information extracted from nonlocal context by the attention mechanism commonly used in RNNs. This nonlocal context can come (i) from parts of the input text t^x that are distant or (ii) from extra (i.e., external) contexts t^y. Experiments on sentence modeling with zero-context (sentiment analysis), single-context (textual entailment) and multiple-context (claim verification) demonstrate the effectiveness of ATTCONV in sentence representation learning with the incorporation of context. In particular, attentive convolution outperforms attentive pooling and is a strong competitor to popular attentive RNNs.
翻译:在国家实验室中,神经神经网络(CNN)比经常性神经网络(RNNNs)从关注机制中获益较少。我们推测,这是因为CNN的注意力主要作为专注集合(即用于集合)而不是作为专注组合(即,它被整合到融合中)来实施,因此CNN的神经网络(NLP)的受益程度低于经常性神经网络(RNNNs)的受益程度,因为在输入文本 tx中,它能够强有力地模拟一个词的更高层次的表达方式,考虑到它的地方固定大小。 在这项工作中,我们建议建立一个专注的神经网络(ATTCONV ) 。它扩大了CONV行动的背景范围,不仅从当地情况中生成了一个词的更高层次的特征,而且还从非当地情况中生成了在RNNUS通常使用的注意机制中常用的信息。这种非本地环境可以(i)从输入文本tx的部分来模拟一个词的高级或(ii),从额外的(即外部)背景。我们提议建立一个专注网络网络。它扩展了合并操作范围,不仅从一个字面面面面的句中产生一个内容,还显示反复解读的合并(CON的合并(分析),还显示单一的合并的合并的合并的合并的合并。