是什么让革命模型在长期序列模型上变得伟大? (What Makes Convolutional Models Great on Long Sequence Modeling?)

Convolutional models have been widely used in multiple domains. However, most existing models only use local convolution, making the model unable to handle long-range dependency efficiently. Attention overcomes this problem by aggregating global information but also makes the computational complexity quadratic to the sequence length. Recently, Gu et al. [2021] proposed a model called S4 inspired by the state space model. S4 can be efficiently implemented as a global convolutional model whose kernel size equals the input sequence length. S4 can model much longer sequences than Transformers and achieve significant gains over SoTA on several long-range tasks. Despite its empirical success, S4 is involved. It requires sophisticated parameterization and initialization schemes. As a result, S4 is less intuitive and hard to use. Here we aim to demystify S4 and extract basic principles that contribute to the success of S4 as a global convolutional model. We focus on the structure of the convolution kernel and identify two critical but intuitive principles enjoyed by S4 that are sufficient to make up an effective global convolutional model: 1) The parameterization of the convolutional kernel needs to be efficient in the sense that the number of parameters should scale sub-linearly with sequence length. 2) The kernel needs to satisfy a decaying structure that the weights for convolving with closer neighbors are larger than the more distant ones. Based on the two principles, we propose a simple yet effective convolutional model called Structured Global Convolution (SGConv). SGConv exhibits strong empirical performance over several tasks: 1) With faster speed, SGConv surpasses S4 on Long Range Arena and Speech Command datasets. 2) When plugging SGConv into standard language and vision models, it shows the potential to improve both efficiency and performance.

翻译：然而,大多数现有模型只能使用本地变换,使模型无法有效处理长距离依赖性。关注通过汇集全球信息克服了这一问题,但也使计算复杂度与序列长度相交。最近,Gu等人(2021年)提议了一个由国家空间模型启发的S4模型。S4可以作为一个全球变动模型有效实施,其内核大小等于输入序列长度。S4可以作为全球变动模型有效实施。S4可以模拟比变异器长得多的序列,并在一些远程任务中比SoTA取得显著的收益。尽管它取得了经验性能成功,但S4参与的S4需要复杂的参数化和初始化计划。因此,S4的变异性更不易和难以使用。我们在这里的目标是解析S4,并提取一些基本原则,作为全球变异变模型的成功。我们侧重于变异模型的结构,确定两个关键但直观的原则,这足以使全球变异性模型产生有效的变异性变变变变变变模型。更精确的变异性变变变变更甚甚远的S2级结构需要更精确性变更精确的变更精确的变更精确的变更精确性结构。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/