The Transformer architecture has become increasingly popular over the past two years, owing to its impressive performance on a number of natural language processing (NLP) tasks. However, all Transformer computations occur at the level of word representations and therefore, it may be argued that Transformer models do not explicitly attempt to learn hierarchical structure which is widely assumed to be integral to language. In the present work, we introduce hierarchical processing into the Transformer model, taking inspiration from the U-Net architecture, popular in computer vision for its hierarchical view of natural images. We empirically demonstrate that the proposed architecture outperforms both the vanilla Transformer and some strong baselines in the domain of chit-chat dialogue.
翻译:过去两年来,由于在自然语言处理(NLP)的一些任务上的表现令人印象深刻,变换器结构越来越受欢迎,然而,所有变换器的计算都发生在文字表达层面,因此,可以说变换器模型并不明确试图学习被广泛认为是语言不可分割的等级结构。 在目前的工作中,我们把等级处理引入变换器模型,从U-Net结构中得到启发,在计算机视野中,对自然图像的等级观很受欢迎。 我们的经验证明,拟议的结构优于香草变换器和奇特-聊天对话领域的一些强有力的基线。