In sequence-to-sequence learning, the decoder relies on the attention mechanism to efficiently extract information from the encoder. While it is common practice to draw information from only the last encoder layer, recent work has proposed to use representations from different encoder layers for diversified levels of information. Nonetheless, the decoder still obtains only a single view of the source sequences, which might lead to insufficient training of the encoder layer stack due to the hierarchy bypassing problem. In this work, we propose layer-wise cross-view decoding, where for each decoder layer, together with the representations from the last encoder layer, which serve as a global view, those from other encoder layers are supplemented for a stereoscopic view of the source sequences. Systematic experiments show that we successfully address the hierarchy bypassing problem and substantially improve the performance of sequence-to-sequence learning with deep representations on diverse tasks.
翻译:在从序列到序列的学习中,解码器依靠注意机制有效地从编码器中提取信息。虽然只从最后的编码器层提取信息是常见的做法,但最近的工作提议使用不同编码器层的表示来提供不同层次的信息。然而,解码器仍然只对源序列有一个单一的视角,这可能导致对编码器层堆层的培训因绕过等级的问题而不够充分。在这项工作中,我们提议了分层交叉视图解码,其中每个解码层,以及最后的编码器层的表示,作为全球观点,其他编码器层的表示补充了源序列的立体视图。系统实验表明,我们成功地解决了绕过问题的等级,大大改进了序列到序列学习的绩效,对不同任务进行了深刻的表述。