Current news datasets merely focus on text features on the news and rarely leverage the feature of images, excluding numerous essential features for news classification. In this paper, we propose a new dataset, N24News, which is generated from New York Times with 24 categories and contains both text and image information in each news. We use a multitask multimodal method and the experimental results show multimodal news classification performs better than text-only news classification. Depending on the length of the text, the classification accuracy can be increased by up to 8.11%. Our research reveals the relationship between the performance of a multimodal classifier and its sub-classifiers, and also the possible improvements when applying multimodal in news classification. N24News is shown to have great potential to prompt the multimodal news studies.
翻译:目前的新闻数据集仅仅侧重于新闻的文字特征,很少利用图像的特征,不包括许多重要的信息分类特征。在本文中,我们提议一个新的数据集,N24News,它来自《纽约时报》,有24个类别,每个新闻都包含文字和图像信息。我们使用多任务多式联运方法和实验结果显示,多式联运新闻分类比仅文本的新闻分类要好。根据文本的长度,分类精确度可以提高至8.11%。我们的研究揭示了多式联运分类师及其分分类师的业绩之间的关系,以及在新闻分类中应用多式联运时可能作出的改进。N24News显示,极有可能推动多式联运新闻研究。