The consumption of news has changed significantly as the Web has become the most influential medium for information. To analyze and contextualize the large amount of news published every day, the geographic focus of an article is an important aspect in order to enable content-based news retrieval. There are methods and datasets for geolocation estimation from text or photos, but they are typically considered as separate tasks. However, the photo might lack geographical cues and text can include multiple locations, making it challenging to recognize the focus location using a single modality. In this paper, a novel dataset called Multimodal Focus Location of News (MM-Locate-News) is introduced. We evaluate state-of-the-art methods on the new benchmark dataset and suggest novel models to predict the focus location of news using both textual and image content. The experimental results show that the multimodal model outperforms unimodal models.
翻译:新闻的消耗量已发生重大变化,因为网络已成为最有影响力的信息媒介。为了分析每天发表的大量新闻并进行背景化分析,文章的地理重点是一个重要方面,以便基于内容的新闻检索。从文本或照片中可以找到地理定位估算的方法和数据集,但这些方法和数据集通常被视为单独的任务。但是,照片可能缺乏地理提示和文本,可能包括多个位置,因此使用单一方式识别焦点位置具有挑战性。本文引入了一个叫做多模式新闻焦点位置(MM-Locate-News)的新数据集。我们评估了新基准数据集的最新方法,并提出了使用文本和图像内容预测新闻焦点位置的新模式。实验结果显示,多式联运模型超越了单一模式模式。