This work outlines how we prioritize original news, a critical indicator of news quality. By examining the landscape and life-cycle of news posts on our social media platform, we identify challenges of building and deploying an originality score. We pursue an approach based on normalized PageRank values and three-step clustering, and refresh the score on an hourly basis to capture the dynamics of online news. We describe a near real-time system architecture, evaluate our methodology, and deploy it to production. Our empirical results validate individual components and show that prioritizing original news increases user engagement with news and improves proprietary cumulative metrics.
翻译:这项工作概述了我们如何优先处理原创新闻,这是新闻质量的一个关键指标。我们通过审查社交媒体平台上新闻的版面和生命周期,确定了建立和部署原创分的挑战。我们采取了基于常规PageRank价值和三步组合的方法,每小时更新分数,以捕捉网上新闻的动态。我们描述了近实时系统架构,评估了我们的方法,并将其用于制作。我们的实证结果验证了个别内容,并显示原创新闻的优先化增加了用户对新闻的接触,改善了专有的累积指标。