The truth is significantly hampered by massive rumors that spread along with breaking news or popular topics. Since there is sufficient corpus gathered from the same domain for model training, existing rumor detection algorithms show promising performance on yesterday's news. However, due to a lack of training data and prior expert knowledge, they are poor at spotting rumors concerning unforeseen events, especially those propagated in different languages (i.e., low-resource regimes). In this paper, we propose a unified contrastive transfer framework to detect rumors by adapting the features learned from well-resourced rumor data to that of the low-resourced. More specifically, we first represent rumor circulated on social media as an undirected topology, and then train a Multi-scale Graph Convolutional Network via a unified contrastive paradigm. Our model explicitly breaks the barriers of the domain and/or language issues, via language alignment and a novel domain-adaptive contrastive learning mechanism. To enhance the representation learning from a small set of target events, we reveal that rumor-indicative signal is closely correlated with the uniformity of the distribution of these events. We design a target-wise contrastive training mechanism with three data augmentation strategies, capable of unifying the representations by distinguishing target events. Extensive experiments conducted on four low-resource datasets collected from real-world microblog platforms demonstrate that our framework achieves much better performance than state-of-the-art methods and exhibits a superior capacity for detecting rumors at early stages.
翻译:大量流传的谣言显著地妨碍了真相的传播,特别是在突发事件或热门话题中。尽管存在足够的来自同一领域的语料用于模型训练,现有的谣言检测算法在昨天的新闻中显示出了很好的效果。但是,由于缺乏训练数据和先前的专业知识,对于关于未知事件的谣言(特别是对于不同语言的谣言),这些算法则表现不佳(即低资源区域的谣言)。本文提出了一个统一的对比传递框架,通过将从充裕资源的谣言数据中学习到的特征适应于低资源数据,从而检测谣言。具体而言,我们首先将在社交媒体上流传的谣言表示为一个无向拓扑,然后通过一个统一的对比范例训练一个多尺度图卷积网络(Multi-scale Graph Convolutional Network)。通过语言对齐和一种新颖的领域自适应对比学习机制,我们的模型明确打破了领域和/或语言问题的障碍。为了增强从少量目标事件中的表示学习,我们揭示了谣言指示信号与这些事件的分布的一致性密切相关。我们设计了一个针对目标的对比训练机制,具有三种数据增强策略,可以通过区分目标事件来统一表示。在来源于真实微博平台的四个低资源数据集上进行的广泛实验表明,我们的框架比现有技术具有更好的性能,并表现出在早期阶段检测谣言的出色能力。