Computer vision has benefited from initializing multiple deep layers with weights pretrained on large supervised training sets like ImageNet. Natural language processing (NLP) typically sees initialization of only the lowest layer of deep models with pretrained word vectors. In this paper, we use a deep LSTM encoder from an attentional sequence-to-sequence model trained for machine translation (MT) to contextualize word vectors. We show that adding these context vectors (CoVe) improves performance over using only unsupervised word and character vectors on a wide variety of common NLP tasks: sentiment analysis (SST, IMDb), question classification (TREC), entailment (SNLI), and question answering (SQuAD). For fine-grained sentiment analysis and entailment, CoVe improves performance of our baseline models to the state of the art.
翻译:计算机愿景得益于启动多个深层层,对图像网等大型监管培训组进行加权培训。自然语言处理(NLP)通常只看到使用预先培训的文字矢量的最低深度模型层的初始化。在本文中,我们使用从经过机器翻译培训的注意序列到序列模型的深 LSTM 编码器来将文字矢量背景化。我们表明,添加这些上下文矢量(CoVe)只会提高业绩,而仅使用未经监督的字和字符矢量来完成各种通用的NLP任务:情绪分析(SST、IMDb)、问题分类(TREC)、要求(SNLI)和问题回答(SQUAD),对于精细的情绪分析和要求,CoVe将我们基线模型的性能提高到艺术状态。