Many natural language processing tasks solely rely on sparse dependencies between a few tokens in a sentence. Soft attention mechanisms show promising performance in modeling local/global dependencies by soft probabilities between every two tokens, but they are not effective and efficient when applied to long sentences. By contrast, hard attention mechanisms directly select a subset of tokens but are difficult and inefficient to train due to their combinatorial nature. In this paper, we integrate both soft and hard attention into one context fusion model, "reinforced self-attention (ReSA)", for the mutual benefit of each other. In ReSA, a hard attention trims a sequence for a soft self-attention to process, while the soft attention feeds reward signals back to facilitate the training of the hard one. For this purpose, we develop a novel hard attention called "reinforced sequence sampling (RSS)", selecting tokens in parallel and trained via policy gradient. Using two RSS modules, ReSA efficiently extracts the sparse dependencies between each pair of selected tokens. We finally propose an RNN/CNN-free sentence-encoding model, "reinforced self-attention network (ReSAN)", solely based on ReSA. It achieves state-of-the-art performance on both Stanford Natural Language Inference (SNLI) and Sentences Involving Compositional Knowledge (SICK) datasets.
翻译:许多自然语言处理任务都完全依赖一个句子中几个符号之间的零散依赖性。 软关注机制显示,通过每两个符号之间的软概率来模拟地方/全球依赖性,表现良好, 软概率每两个符号之间的软概率, 但当应用到长句子时效果和效率不高。 相反, 硬关注机制直接选择一组符号, 但由于其组合性质, 很难和低效地进行培训。 在本文件中, 我们将软和硬关注都纳入一个背景融合模式, “ 强制自我保护( ReSA) ”, 以相互受益。 在 ReSA 中, 硬关注将软性自我保护排序为软性进程, 而软性关注又为对硬性句子的培训提供回报信号。 为此, 我们开发了一种新的硬关注, 叫做“ 强化序列抽样(RSS) ”, 通过政策梯度选择同时和训练的代号。 使用两个RSS 模块, ReSA 高效地提取了每对选定标识之间稀少的依赖性。 我们最后建议了一个基于 RNN/ CN 免费句号的自省的自省(S- recocolence) AS- recust- recusting Stat- recust- Regional Stefol) AS) AS- sal AS- sal- semportmental- sal- sheal- sal- sal- sal- sal- semportmentional- sal- sal- sal- sal- sal- sal- sal- sal- semportmental- sal- sal- sal- sal- sal- sem