Most rumour detection models for social media are designed for one specific language (mostly English). There are over 40 languages on Twitter and most languages lack annotated resources to build rumour detection models. In this paper we propose a zero-shot cross-lingual transfer learning framework that can adapt a rumour detection model trained for a source language to another target language. Our framework utilises pretrained multilingual language models (e.g.\ multilingual BERT) and a self-training loop to iteratively bootstrap the creation of ''silver labels'' in the target language to adapt the model from the source language to the target language. We evaluate our methodology on English and Chinese rumour datasets and demonstrate that our model substantially outperforms competitive benchmarks in both source and target language rumour detection.
翻译:社交媒体的多数谣言检测模式是针对一种特定语言(主要是英语)设计的,推特上有40多种语言,大多数语言缺乏建立谣言检测模式的附加说明的资源。在本文件中,我们提议了一个零点跨语言传输学习框架,可以将受过源语言培训的谣言检测模式改造为另一种目标语言。我们的框架使用经过预先训练的多语言模式(如多语言BERT)和自我培训循环,以迭代方式将创建目标语言的“稀释标签”连接起来,从源语言到目标语言。我们评估了我们关于英语和中文谣言数据集的方法,并证明我们的模型在源语言和目标语言谣言检测两方面都大大优于竞争基准。