We introduce SCRum-9, the largest multilingual Stance Classification dataset for Rumour analysis in 9 languages, containing 7,516 tweets from X. SCRum-9 goes beyond existing stance classification datasets by covering more languages, linking examples to more fact-checked claims (2.1k), and including confidence-related annotations from multiple annotators to account for intra- and inter-annotator variability. Annotations were made by at least two native speakers per language, totalling more than 405 hours of annotation and 8,150 dollars in compensation. Further, SCRum-9 is used to benchmark five large language models (LLMs) and two multilingual masked language models (MLMs) in In-Context Learning (ICL) and fine-tuning setups. This paper also innovates by exploring the use of multilingual synthetic data for rumour stance classification, showing that even LLMs with weak ICL performance can produce valuable synthetic data for fine-tuning small MLMs, enabling them to achieve higher performance than zero-shot ICL in LLMs. Finally, we examine the relationship between model predictions and human uncertainty on ambiguous cases finding that model predictions often match the second-choice labels assigned by annotators, rather than diverging entirely from human judgments. SCRum-9 is publicly released to the research community with potential to foster further research on multilingual analysis of misleading narratives on social media.
翻译:我们介绍了SCRum-9,这是目前规模最大的多语言谣言立场分类数据集,涵盖9种语言,包含来自X平台的7,516条推文。SCRum-9超越了现有立场分类数据集,其优势在于覆盖更多语言、将样本关联至更多经事实核查的声明(2.1k条),并包含来自多位标注者的置信度相关标注,以考虑标注者内部及标注者间的变异性。每种语言的标注工作均由至少两名母语者完成,总计标注时间超过405小时,标注报酬达8,150美元。此外,我们使用SCRum-9在上下文学习(ICL)和微调两种设置下,对五个大型语言模型(LLMs)和两个多语言掩码语言模型(MLMs)进行了基准测试。本文的创新之处还在于探索了多语言合成数据在谣言立场分类中的应用,结果表明即使ICL性能较弱的LLMs也能生成有价值的合成数据,用于微调小型MLMs,使其性能超越LLMs的零样本ICL。最后,我们研究了模型预测与人类在模糊案例上的不确定性之间的关系,发现模型预测往往与标注者分配的第二选择标签相符,而非完全偏离人类判断。SCRum-9已公开发布给研究社区,有望推动社交媒体误导性叙事的多语言分析研究。