Sharing of anti-vaccine posts on social media, including misinformation posts, has been shown to create confusion and reduce the publics confidence in vaccines, leading to vaccine hesitancy and resistance. Recent years have witnessed the fast rise of such anti-vaccine posts in a variety of linguistic and visual forms in online networks, posing a great challenge for effective content moderation and tracking. Extending previous work on leveraging textual information to understand vaccine information, this paper presents Insta-VAX, a new multi-modal dataset consisting of a sample of 64,957 Instagram posts related to human vaccines. We applied a crowdsourced annotation procedure verified by two trained expert judges to this dataset. We then bench-marked several state-of-the-art NLP and computer vision classifiers to detect whether the posts show anti-vaccine attitude and whether they contain misinformation. Extensive experiments and analyses demonstrate the multimodal models can classify the posts more accurately than the uni-modal models, but still need improvement especially on visual context understanding and external knowledge cooperation. The dataset and classifiers contribute to monitoring and tracking of vaccine discussions for social scientific and public health efforts in combating the problem of vaccine misinformation.
翻译:在社交媒体上分享反疫苗站的情况,包括错误信息站点,已经证明造成了混乱,降低了公众对疫苗的信心,导致疫苗犹豫不决和抗药性。近年来,在线网络中各种语言和视觉形式的此类抗疫苗站迅速增加,对有效调适和跟踪内容提出了巨大挑战。扩大以前关于利用文本信息来理解疫苗信息的工作,本文展示了Insta-VAX,这是一套新的多模式数据集,由64 957个与人类疫苗有关的插件样本组成。我们对这一数据集采用了由经过培训的两名专家法官核实的众源注解程序。然后,我们设计了数个最先进的NLP和计算机视觉分类仪,以检测这些站点是否表现出反疫苗态度以及它们是否含有错误信息。广泛的实验和分析表明,多式联运模型可以比单式模型更准确地分类,但仍然需要改进,特别是在视觉背景理解和外部知识合作方面。数据设置和分类仪有助于监测和跟踪疫苗讨论,以便应对疫苗中的社会和公共卫生问题。