用于快速处理长序列的剩余损益交换网络 (Residual Shuffle-Exchange Networks for Fast Processing of Long Sequences)

Attention is a commonly used mechanism in sequence processing, but it is of O(n^2) complexity which prevents its application to long sequences. The recently introduced neural Shuffle-Exchange network offers a computation-efficient alternative, enabling the modelling of long-range dependencies in O(n log n) time. The model, however, is quite complex, involving a sophisticated gating mechanism derived from the Gated Recurrent Unit. In this paper, we present a simple and lightweight variant of the Shuffle-Exchange network, which is based on a residual network employing GELU and Layer Normalization. The proposed architecture not only scales to longer sequences but also converges faster and provides better accuracy. It surpasses the Shuffle-Exchange network on the LAMBADA language modelling task and achieves state-of-the-art performance on the MusicNet dataset for music transcription while being efficient in the number of parameters. We show how to combine the improved Shuffle-Exchange network with convolutional layers, establishing it as a useful building block in long sequence processing applications.

翻译：在序列处理中,人们通常采用注意机制,但这种注意是O(n)2复杂,无法将其应用于长序列。最近推出的神经休克-交换网络提供了一种计算效率高的替代方法,能够模拟O(nlognn)时间的远距离依赖性。但是,该模型相当复杂,涉及由Gated 经常程序股衍生的复杂格子机制。在本文中,我们介绍了一个简单和轻量级的舒夫-交换网络变种,这个网络以使用 GELU 和层正常化的剩余网络为基础。拟议的结构不仅可以比长序列,而且可以更快地聚集,并且提供更好的准确性。它超过了LAMBAD语言建模任务上的舒夫勒-交换网络,在音乐转录的音乐网数据集上取得了最先进的性能,同时在参数数量上效率很高。我们展示了如何将改进的舒夫- Exchange网络与变异层结合起来,在长序列处理应用中将它建成一个有用的建筑块。

相关内容

Networking

关注 22

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

一份简单《图神经网络》教程，28页ppt

专知会员服务

126+阅读 · 2020年8月2日

【DeepMind深度学习课程】序列循环神经网络，141页ppt，Sequences and Recurrent Network

专知会员服务

86+阅读 · 2020年6月23日

【SIGIR2020-中科院】TAGNN: 基于会话推荐的目标注意力图神经网络，TAGNN: Target Attentive Graph Neural Networks for Session-based Recommendation

专知会员服务

42+阅读 · 2020年5月10日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日