We study the problem of profiling news media on the Web with respect to their factuality of reporting and bias. This is an important but under-studied problem related to disinformation and "fake news" detection, but it addresses the issue at a coarser granularity compared to looking at an individual article or an individual claim. This is useful as it allows to profile entire media outlets in advance. Unlike previous work, which has focused primarily on text (e.g.,~on the text of the articles published by the target website, or on the textual description in their social media profiles or in Wikipedia), here our main focus is on modeling the similarity between media outlets based on the overlap of their audience. This is motivated by homophily considerations, i.e.,~the tendency of people to have connections to people with similar interests, which we extend to media, hypothesizing that similar types of media would be read by similar kinds of users. In particular, we propose GREENER (GRaph nEural nEtwork for News mEdia pRofiling), a model that builds a graph of inter-media connections based on their audience overlap, and then uses graph neural networks to represent each medium. We find that such representations are quite useful for predicting the factuality and the bias of news media outlets, yielding improvements over state-of-the-art results reported on two datasets. When augmented with conventionally used representations obtained from news articles, Twitter, YouTube, Facebook, and Wikipedia, prediction accuracy is found to improve by 2.5-27 macro-F1 points for the two tasks.
翻译:我们研究网上新闻媒介的描述问题,了解其报道和偏见的真实性。这是一个重要但研究不足的问题,涉及信息失实和“假新闻”的检测,但这个问题的动机是,与个人文章或个人主张相比,是粗粗的颗粒,因为这样可以提前描述整个媒体单位。与以前的工作不同,以前的工作主要侧重于文字(例如目标网站发表的文章文本,或社交媒体简介或维基百科的文字描述),我们的主要重点是根据受众的重叠来模拟媒体单位之间的相似性。这是出于同理考虑,即人们倾向于与有类似兴趣的人有联系,我们将其扩大到媒体,假设类似类型的媒体会被类似用户阅读。特别是,我们提议GREENER(GRAph Rickal-thealway webliferal Twitter在新闻媒介简介或维基百科中取得的文字描述),我们的主要重点是根据媒体用户的重叠情况来模拟媒体的准确性,我们从每部媒体的图像中找到一个图象图,我们使用的是两种质量的图像,我们用两种图表来反映其正常的图像。