Distinguishing between misinformation and real information is one of the most challenging problems in today's interconnected world. The vast majority of the state-of-the-art in detecting misinformation is fully supervised, requiring a large number of high-quality human annotations. However, the availability of such annotations cannot be taken for granted, since it is very costly, time-consuming, and challenging to do so in a way that keeps up with the proliferation of misinformation. In this work, we are interested in exploring scenarios where the number of annotations is limited. In such scenarios, we investigate how tapping on a diverse number of resources that characterize a news article, henceforth referred to as "aspects" can compensate for the lack of labels. In particular, our contributions in this paper are twofold: 1) We propose the use of three different aspects: article content, context of social sharing behaviors, and host website/domain features, and 2) We introduce a principled tensor based embedding framework that combines all those aspects effectively. We propose HiJoD a 2-level decomposition pipeline which not only outperforms state-of-the-art methods with F1-scores of 74% and 81% on Twitter and Politifact datasets respectively but also is an order of magnitude faster than similar ensemble approaches.
翻译:分辨错误信息与真实信息是当今相互联系的世界中最具挑战性的问题之一。在发现错误信息方面,绝大多数最先进的资源都受到充分监督,需要大量的高质量的人文说明。然而,不能认为提供这种说明是理所当然的,因为这样做的成本非常昂贵、耗时而且具有挑战性,因为这样做的方式要跟上错误信息的扩散。在这项工作中,我们有兴趣探索说明数量有限的情景。在这种情景中,我们调查如何利用不同数量的资源来描述新闻文章的特点,此后被称为“目标”的“目标”可以弥补标签的缺乏。特别是,我们在本文件中的贡献有两个方面:(1) 我们建议使用三个不同方面:文章内容、社会共享行为的背景以及网站/主机主功能。(2) 我们引入一个基于原则的“高压嵌入框架,将所有这些方面有效地结合起来。我们建议HiJoD建立一个2级的分解配置管道,它不仅超越了“目标”的状态方法,而且能够弥补标签的缺乏。我们在本文件中的贡献是双重的:(1) 我们建议使用三个不同的方面:文章内容内容、社会共享行为的背景以及网站/主域特征分别为74%和81%的数据速度。