A lot of natural language processing problems need to encode the text sequence as a fix-length vector, which usually involves aggregation process of combining the representations of all the words, such as pooling or self-attention. However, these widely used aggregation approaches did not take higher-order relationship among the words into consideration. Hence we propose a new way of obtaining aggregation weights, called eigen-centrality self-attention. More specifically, we build a fully-connected graph for all the words in a sentence, then compute the eigen-centrality as the attention score of each word. The explicit modeling of relationships as a graph is able to capture some higher-order dependency among words, which helps us achieve better results in 5 text classification tasks and one SNLI task than baseline models such as pooling, self-attention and dynamic routing. Besides, in order to compute the dominant eigenvector of the graph, we adopt power method algorithm to get the eigen-centrality measure. Moreover, we also derive an iterative approach to get the gradient for the power method process to reduce both memory consumption and computation requirement.}
翻译:许多自然语言处理问题都需要将文本序列编码成固定长度矢量,这通常涉及将所有词的表达方式(如集合或自省)合并在一起的汇总过程。然而,这些广泛使用的汇总方法并没有考虑这些词之间的较高顺序关系。因此我们提出了一种新的方法来获得总重,称为eigen-中央自省。更具体地说,我们为一个句子中的所有单词建立一个完全连接的图表,然后将eigen-中央作为每个单词的注意分数进行计算。一个图表的直线关系模型能够捕捉到一些更高度的单词依赖性,这帮助我们在5个文本分类任务和1个SNLI任务中取得更好的结果,而不是集合、自留和动态路由等基线模型。此外,为了计算图的主导源源,我们采用了权力方法算法,以获得eigen-中央度的测量。此外,我们还得出一种迭接法方法,以获得权力方法过程的梯度来减少记忆消耗和计算要求。}