Current news datasets merely focus on text features on the news and rarely leverage the feature of images, excluding numerous essential features for news classification. In this paper, we propose a new dataset, N15News, which is generated from New York Times with 15 categories and contains both text and image information in each news. We design a novel multitask multimodal network with different fusion methods, and experiments show multimodal news classification performs better than text-only news classification. Depending on the length of the text, the classification accuracy can be increased by up to 5.8%. Our research reveals the relationship between the performance of a multimodal classifier and its sub-classifiers, and also the possible improvements when applying multimodal in news classification. N15News is shown to have great potential to prompt the multimodal news studies.
翻译:目前的新闻数据集仅仅侧重于新闻的文字特征,很少利用图像的特征,不包括许多重要的信息分类特征。在本文中,我们提议一个新的数据集N15News,该数据集来自《纽约时报》,分为15个类别,每个新闻都包含文字和图像信息。我们设计了一个新颖的多任务多式联运网络,采用不同的聚合方法,实验显示多式联运新闻分类的表现优于只使用文字的新闻分类。根据文本的长度,分类精确度可以提高到5.8%。我们的研究揭示了多式联运分类师及其子分类师的绩效之间的关系,以及在新闻分类中应用多式联运时可能作出的改进。N15News显示极有可能推动多式联运新闻研究。