In this paper, we propose the joint learning attention and recurrent neural network (RNN) models for multi-label classification. While approaches based on the use of either model exist (e.g., for the task of image captioning), training such existing network architectures typically require pre-defined label sequences. For multi-label classification, it would be desirable to have a robust inference process, so that the prediction error would not propagate and thus affect the performance. Our proposed model uniquely integrates attention and Long Short Term Memory (LSTM) models, which not only addresses the above problem but also allows one to identify visual objects of interests with varying sizes without the prior knowledge of particular label ordering. More importantly, label co-occurrence information can be jointly exploited by our LSTM model. Finally, by advancing the technique of beam search, prediction of multiple labels can be efficiently achieved by our proposed network model.
翻译:在本文中,我们建议采用联合学习关注和经常性神经网络(RNN)模式进行多标签分类。虽然存在基于使用两种模式的方法(例如用于图像说明任务),但培训这类现有网络结构通常需要预先定义的标签序列。对于多标签分类,最好有一个强有力的推理过程,以便预测错误不会传播,从而影响性能。我们提议的模型将注意力和长短期内存(LSTM)模式(LSTM)模式(LSTM)模式(LSTM)模式(LSTM)(LSTM)模式(LSTM)(LSTM)模式(LSTM)(LSTM)(LSTM)模式(LSTM)(LSTM)模式(LSTM)(LSTM)(LSTM)模式(LSTM(LSTM)模式)(LSTM(LSTM)模式)(LSTM(LSTM(LT)模式)(LSTM(LSTM(LSTM)模式)(LSTM(LSTM(LT)模式)(LT)模式)(LSTM(LSTM(LT)模式)(LT)(LT)(LT)(LT)(LT)(LT)不仅解决上述问题,它不仅允许人们发现上述问题,它不仅可以识别问题,它不仅可以识别问题,还可以错错错错错错错错错错错,还),而且还。更重要的是别。更重要的是识别,还),而且可以有效。更重要的是,而且可以共同利用这些模式),也让错算算算算算算算算算算算算算。更重要的是,我们网络模式(LT(LT(LT)模式(LT)模式(LT)模式(LT)模式(LT)模式)模式(LT),也)。