Vaccine hesitancy has been a common concern, probably since vaccines were created and, with the popularisation of social media, people started to express their concerns about vaccines online alongside those posting pro- and anti-vaccine content. Predictably, since the first mentions of a COVID-19 vaccine, social media users posted about their fears and concerns or about their support and belief into the effectiveness of these rapidly developing vaccines. Identifying and understanding the reasons behind public hesitancy towards COVID-19 vaccines is important for policy markers that need to develop actions to better inform the population with the aim of increasing vaccine take-up. In the case of COVID-19, where the fast development of the vaccines was mirrored closely by growth in anti-vaxx disinformation, automatic means of detecting citizen attitudes towards vaccination became necessary. This is an important computational social sciences task that requires data analysis in order to gain in-depth understanding of the phenomena at hand. Annotated data is also necessary for training data-driven models for more nuanced analysis of attitudes towards vaccination. To this end, we created a new collection of over 3,101 tweets annotated with users' attitudes towards COVID-19 vaccination (stance). Besides, we also develop a domain-specific language model (VaxxBERT) that achieves the best predictive performance (73.0 accuracy and 69.3 F1-score) as compared to a robust set of baselines. To the best of our knowledge, these are the first dataset and model that model vaccine hesitancy as a category distinct from pro- and anti-vaccine stance.
翻译:疫苗犹豫一直是一个常见的担忧,可能自从疫苗问世以来就是如此,随着社交媒体的普及,人们开始在网上表达他们对疫苗的关注和忧虑,这与那些发布支持和反对疫苗内容的人相伴而行。自COVID-19疫苗第一次被提及以来,社交媒体用户就会发帖表达他们的恐惧和担心,或者对这些快速发展的疫苗的有效性的支持和信仰。识别和理解公众对COVID-19疫苗犹豫的原因对于政策制定者来说很重要,他们需要制定行动计划,以更好地告知公众,以提高接种疫苗的比例。在COVID-19的情况下,疫苗的快速开发与反疫苗不实信息的增长密切相关,因此需要自动检测公民对疫苗的态度的手段。这是一项重要的计算社会科学任务,需要对数据进行分析,以深入了解所涉及的现象。此外,注释数据对于训练基于数据驱动的模型以获取更细致分析疫苗态度也是必要的。为此,我们创建了一份包含超过3,101条推文的新集合,并注释了用户对COVID-19疫苗的态度(立场)。此外,我们还开发了一个特定领域的语言模型(VaxxBERT),与强大的基线相比,实现了最佳的预测性能(73.0%的准确率和69.3 F1分数)。据我们所知,这是第一个将疫苗犹豫建模为与支持和反对疫苗态度不同类别的数据集和模型。