Deep neural networks have achieved state-of-the-art results in various vision and/or language tasks. Despite the use of large training datasets, most models are trained by iterating over single input-output pairs, discarding the remaining examples for the current prediction. In this work, we actively exploit the training data, using the information from nearest training examples to aid the prediction both during training and testing. Specifically, our approach uses the target of the most similar training example to initialize the memory state of an LSTM model, or to guide attention mechanisms. We apply this approach to image captioning and sentiment analysis, respectively through image and text retrieval. Results confirm the effectiveness of the proposed approach for the two tasks, on the widely used Flickr8 and IMDB datasets. Our code is publicly available at http://github.com/RitaRamo/retrieval-augmentation-nn.
翻译:深神经网络在各种视觉和(或)语言任务中取得了最先进的结果。尽管使用了大型培训数据集,但大多数模型都是通过对单一投入-产出对子的迭代来培训的,从而抛弃了目前预测的剩余实例。在这项工作中,我们积极利用培训数据,利用最近的培训实例中的信息来帮助在培训和测试期间的预测。具体地说,我们的方法使用最类似的培训范例的目标来启动LSTM模型的记忆状态,或指导关注机制。我们通过图像和文本检索,对图像说明和情绪分析分别采用这一方法。结果证实了在广泛使用的Flickr8和IMDB数据集中拟议的两项任务方法的有效性。我们的代码在http://github.com/RitaRamo/retrieval-Augmentation-nn上公开提供。