Vaccine hesitancy is considered as one main cause of the stagnant uptake ratio of COVID-19 vaccines in Europe and the US where vaccines are sufficiently supplied. Fast and accurate grasp of public attitudes toward vaccination is critical to address vaccine hesitancy, and social media platforms have proved to be an effective source of public opinions. In this paper, we describe the collection and release of a dataset of tweets related to COVID-19 vaccines. This dataset consists of the IDs of 2,198,090 tweets collected from Western Europe, 17,934 of which are annotated with the originators' vaccination stances. Our annotation will facilitate using and developing data-driven models to extract vaccination attitudes from social media posts and thus further confirm the power of social media in public health surveillance. To lay the groundwork for future research, we not only perform statistical analysis and visualisation of our dataset, but also evaluate and compare the performance of established text-based benchmarks in vaccination stance extraction. We demonstrate one potential use of our data in practice in tracking the temporal changes of public COVID-19 vaccination attitudes.
翻译:疫苗失灵被认为是欧洲和美国疫苗供应充足地区COVID-19疫苗摄入率停滞的主要原因之一。快速准确地掌握公众对疫苗的态度对于解决疫苗失灵问题至关重要,社交媒体平台被证明是公众意见的有效来源。本文描述了收集和发布与COVID-19疫苗有关的推文数据集的情况。这一数据集包括从西欧收集的2 198 090条推文,其中17 934条推文与疫苗发起者一道作了说明。我们的注解将促进使用和开发数据驱动模式,从社会媒体的岗位上提取疫苗接种态度,从而进一步确认社会媒体在公共卫生监督方面的力量。为今后研究奠定基础,我们不仅进行统计分析,并对我们的数据集进行可视化。我们不仅评估和比较了基于文本的既定基准在疫苗提取方面的绩效。我们展示了在跟踪公共COVID-19疫苗接种态度的时间变化方面可能使用的数据。