疫苗文化：用于研究Twitter上疫苗话语的数据集 (Vax-Culture: A Dataset for Studying Vaccine Discourse on Twitter)

Vaccine hesitancy continues to be a main challenge for public health officials during the COVID-19 pandemic. As this hesitancy undermines vaccine campaigns, many researchers have sought to identify its root causes, finding that the increasing volume of anti-vaccine misinformation on social media platforms is a key element of this problem. We explored Twitter as a source of misleading content with the goal of extracting overlapping cultural and political beliefs that motivate the spread of vaccine misinformation. To do this, we have collected a data set of vaccine-related Tweets and annotated them with the help of a team of annotators with a background in communications and journalism. Ultimately we hope this can lead to effective and targeted public health communication strategies for reaching individuals with anti-vaccine beliefs. Moreover, this information helps with developing Machine Learning models to automatically detect vaccine misinformation posts and combat their negative impacts. In this paper, we present Vax-Culture, a novel Twitter COVID-19 dataset consisting of 6373 vaccine-related tweets accompanied by an extensive set of human-provided annotations including vaccine-hesitancy stance, indication of any misinformation in tweets, the entities criticized and supported in each tweet and the communicated message of each tweet. Moreover, we define five baseline tasks including four classification and one sequence generation tasks, and report the results of a set of recent transformer-based models for them. The dataset and code are publicly available at https://github.com/mrzarei5/Vax-Culture.

翻译：疫苗犹豫仍然是COVID-19大流行中公共卫生官员面临的主要挑战。随着这种犹豫影响到疫苗运动，许多研究人员试图找出其根本原因，发现社交媒体平台上反疫苗错误信息的不断增加是这一问题的关键因素。我们探索 Twitter 作为误导内容的来源，旨在提取激励疫苗错误信息传播的重叠文化和政治信仰。为此，我们收集了一个有关疫苗的数据集，并在与传媒和新闻学背景的注释员合作的情况下进行了注释。最终，我们希望这可以导致有效和针对性的公共卫生传播策略，以接触持反疫苗信仰的个人。此外，该信息有助于开发机器学习模型自动检测疫苗错误信息帖子并应对其负面影响。在本文中，我们提供了Vax-Culture数据集，这是一个新的Twitter COVID-19数据集，包括6373个疫苗相关推文和一整套由人提供的注释，包括疫苗犹豫态度，推文中的任何错误信息的指示，每个推文中受到批评和支持的实体以及每个推文的传达信息。此外，我们定义了五个基线任务，包括四个分类任务和一个序列生成任务，并报告了最近一些基于 Transformer 的模型的结果。数据集和代码公开在https://github.com/mrzarei5/Vax-Culture。