非当地经常性神经内存的学习序列表 (Learning Sequence Representations by Non-local Recurrent Neural Memory)

The key challenge of sequence representation learning is to capture the long-range temporal dependencies. Typical methods for supervised sequence representation learning are built upon recurrent neural networks to capture temporal dependencies. One potential limitation of these methods is that they only model one-order information interactions explicitly between adjacent time steps in a sequence, hence the high-order interactions between nonadjacent time steps are not fully exploited. It greatly limits the capability of modeling the long-range temporal dependencies since the temporal features learned by one-order interactions cannot be maintained for a long term due to temporal information dilution and gradient vanishing. To tackle this limitation, we propose the Non-local Recurrent Neural Memory (NRNM) for supervised sequence representation learning, which performs non-local operations \MR{by means of self-attention mechanism} to learn full-order interactions within a sliding temporal memory block and models global interactions between memory blocks in a gated recurrent manner. Consequently, our model is able to capture long-range dependencies. Besides, the latent high-level features contained in high-order interactions can be distilled by our model. We validate the effectiveness and generalization of our NRNM on three types of sequence applications across different modalities, including sequence classification, step-wise sequential prediction and sequence similarity learning. Our model compares favorably against other state-of-the-art methods specifically designed for each of these sequence applications.

翻译：序列代表学习的关键挑战是掌握远程时间依赖性。监督序列代表学习的典型方法建立在经常性神经网络的基础上,以捕捉时间依赖性。这些方法的一个潜在局限性是,它们只是模拟相邻时间步骤之间在一个序列中明确的单顺序信息互动,因此没有充分利用不相邻时间步骤之间的高度顺序互动,从而大大限制了长距离时间依赖性建模能力,因为由于时间信息稀释和梯度消失,单顺序互动所学的时间特征无法长期保持。为了应对这一限制,我们建议非本地经常性神经记忆(NRNM)用于监督序列代表学习,以非本地操作/MR{自我注意机制工具}的方式进行非本地操作/MR{M{全序信息互动,在滑动的时间记忆区块中学习全顺序互动,以封闭的重复方式模拟全球记忆区际互动。因此,我们的模型能够捕捉到长距离依赖性差异。此外,由于时间信息稀释和渐渐渐渐消失,高顺序互动中所包含的潜在高层次特征可以被我们的模型所取代。我们具体地验证了非本地经常神经存储(RMRM)系统应用的每一种排序,包括我们不同步骤的顺序的顺序的顺序,我们对这些不同系列的顺序的顺序的顺序的顺序的顺序的顺序的顺序的顺序的顺序的顺序的顺序的顺序的顺序的顺序的顺序的顺序的顺序的顺序的顺序的顺序的顺序。