Local news articles are a subset of news that impact users in a geographical area, such as a city, county, or state. Detecting local news (Step 1) and subsequently deciding its geographical location as well as radius of impact (Step 2) are two important steps towards accurate local news recommendation. Naive rule-based methods, such as detecting city names from the news title, tend to give erroneous results due to lack of understanding of the news content. Empowered by the latest development in natural language processing, we develop an integrated pipeline that enables automatic local news detection and content-based local news recommendations. In this paper, we focus on Step 1 of the pipeline, which highlights: (1) a weakly supervised framework incorporated with domain knowledge and auto data processing, and (2) scalability to multi-lingual settings. Compared with Stanford CoreNLP NER model, our pipeline has higher precision and recall evaluated on a real-world and human-labeled dataset. This pipeline has potential to more precise local news to users, helps local businesses get more exposure, and gives people more information about their neighborhood safety.
翻译:当地新闻文章是影响城市、县或州等地理区域用户的新闻的子集。检测当地新闻(第1步)并随后决定其地理位置和影响半径(第2步)是实现准确的地方新闻建议的两个重要步骤。基于规则的原始方法,如从新闻标题中探测城市名称,由于对新闻内容缺乏了解,往往产生错误的结果。通过自然语言处理的最新发展,我们开发了一个综合管道,使当地新闻自动检测和基于内容的地方新闻建议成为可能。在本文中,我们侧重于管道的第1步,强调:(1) 与域知识和自动数据处理相结合的受监管薄弱的框架,以及(2) 适用于多语种环境。与斯坦福核心NLP NER模型相比,我们的管道在现实世界和人类标签数据集中具有更高的精确度和回顾性。这一管道有可能使用户获得更精确的本地新闻,帮助当地企业获得更多的接触,并使人们获得更多有关其邻居安全的信息。