Attention is typically used to select informative sub-phrases that are used for prediction. This paper investigates the novel use of attention as a form of feature augmentation, i.e, casted attention. We propose Multi-Cast Attention Networks (MCAN), a new attention mechanism and general model architecture for a potpourri of ranking tasks in the conversational modeling and question answering domains. Our approach performs a series of soft attention operations, each time casting a scalar feature upon the inner word embeddings. The key idea is to provide a real-valued hint (feature) to a subsequent encoder layer and is targeted at improving the representation learning process. There are several advantages to this design, e.g., it allows an arbitrary number of attention mechanisms to be casted, allowing for multiple attention types (e.g., co-attention, intra-attention) and attention variants (e.g., alignment-pooling, max-pooling, mean-pooling) to be executed simultaneously. This not only eliminates the costly need to tune the nature of the co-attention layer, but also provides greater extents of explainability to practitioners. Via extensive experiments on four well-known benchmark datasets, we show that MCAN achieves state-of-the-art performance. On the Ubuntu Dialogue Corpus, MCAN outperforms existing state-of-the-art models by $9\%$. MCAN also achieves the best performing score to date on the well-studied TrecQA dataset.
翻译:通常用于选择用于预测的信息性子句。 本文用于调查关注的新用途, 作为一种特性增强的形式, 即引起注意。 我们建议多关注网络( MCAN), 这是一种新的关注机制和通用模型结构, 用于在对话模式和回答答题的域中进行排序任务。 我们的方法是进行一系列软关注操作, 每次在内字嵌入上显示一个标度。 关键的想法是向随后的编码层提供一个真正有价值的提示( 功能), 目标是改进代表学习过程。 我们提出这个设计有好几种优点, 例如, 它允许任意设置一些关注机制, 用于在对话模式和回答域中进行排序。 我们的方法是同时进行一系列软关注操作操作, 每次在内字嵌嵌时都显示一个标度。 关键的想法是, 不仅需要为随后的编码层提供一个真正有价值的提示( 功能性), 还旨在改进代表性学习过程。 例如, 它允许任意设定关注性机制的数量, 允许多个关注类型( 例如, 共同关注、 内部关注) 和关注变体) 以及关注变体( 例如, 校准, 校准, ) 以现有的运行中的数据实现更精确的数据。