Early detection of relevant locations in a piece of news is especially important in extreme events such as environmental disasters, war conflicts, disease outbreaks, or political turmoils. Additionally, this detection also helps recommender systems to promote relevant news based on user locations. Note that, when the relevant locations are not mentioned explicitly in the text, state-of-the-art methods typically fail to recognize them because these methods rely on syntactic recognition. In contrast, by incorporating a knowledge base and connecting entities with their locations, our system successfully infers the relevant locations even when they are not mentioned explicitly in the text. To evaluate the effectiveness of our approach, and due to the lack of datasets in this area, we also contribute to the research community with a gold-standard multilingual news-location dataset, NewsLOC. It contains the annotation of the relevant locations (and their WikiData IDs) of 600+ Wikinews articles in five different languages: English, French, German, Italian, and Spanish. Through experimental evaluations, we show that our proposed system outperforms the baselines and the fine-tuned version of the model using semi-supervised data that increases the classification rate. The source code and the NewsLOC dataset are publicly available for being used by the research community at https://github.com/vsuarezpaniagua/NewsLocation.
翻译:在一个新闻中及早发现相关地点对于环境灾害、战争冲突、疾病爆发或政治动荡等极端事件尤其重要。此外,这一发现也有助于推荐系统,以推广基于用户地点的相关新闻。请注意,当相关地点在文本中没有明确提及时,最先进的方法通常无法认出这些地点,因为这些方法依赖于合成识别。相比之下,通过纳入一个知识库并将各实体与其所在地连接起来,我们的系统成功地推断了相关地点,即便在文本中没有明确提及这些地点。为了评估我们的方法的有效性,并由于缺乏这方面的数据集,我们还以黄金标准多语种新闻定位数据集NewsLOC(NewsLOC)为研究界作出贡献。它包含有关地点的说明(及其WikiData ID)以五种不同语言(英语、法语、德语、德语、意大利语和西班牙语)的600+Wikinews文章。通过实验性评估,我们显示我们提议的系统超越了基准和新模式的精细版本,而使用半超版新闻定位/视频的模型使用半超版新闻定位新闻网站。它用于公共数据的比例。