Convolutions operate only locally, thus failing to model global interactions. Self-attention is, however, able to learn representations that capture long-range dependencies in sequences. We propose a network architecture for audio super-resolution that combines convolution and self-attention. Attention-based Feature-Wise Linear Modulation (AFiLM) uses self-attention mechanism instead of recurrent neural networks to modulate the activations of the convolutional model. Extensive experiments show that our model outperforms existing approaches on standard benchmarks. Moreover, it allows for more parallelization resulting in significantly faster training.
翻译:革命只在当地运作,因此无法模拟全球互动。 但是,自我关注能够学习在序列中捕捉长期依赖关系的表达方式。 我们提出一个音频超分辨率的网络结构,将演化和自我关注结合起来。 以关注为基础的地物-Wise Linear Modulation(AFILM)使用自我关注机制,而不是经常性神经网络来调节革命模式的激活。 广泛的实验表明,我们的模型比标准基准的现有方法要好。 此外,它允许更加平行化,从而大大加快培训速度。