Translated title: 新冠疫苗犹豫不决：一份用于研究推特上犹豫接种COVID-19疫苗的数据集 (VaxxHesitancy: A Dataset for Studying Hesitancy towards COVID-19 Vaccination on Twitter)

Vaccine hesitancy has been a common concern, probably since vaccines were created and, with the popularisation of social media, people started to express their concerns about vaccines online alongside those posting pro- and anti-vaccine content. Predictably, since the first mentions of a COVID-19 vaccine, social media users posted about their fears and concerns or about their support and belief into the effectiveness of these rapidly developing vaccines. Identifying and understanding the reasons behind public hesitancy towards COVID-19 vaccines is important for policy markers that need to develop actions to better inform the population with the aim of increasing vaccine take-up. In the case of COVID-19, where the fast development of the vaccines was mirrored closely by growth in anti-vaxx disinformation, automatic means of detecting citizen attitudes towards vaccination became necessary. This is an important computational social sciences task that requires data analysis in order to gain in-depth understanding of the phenomena at hand. Annotated data is also necessary for training data-driven models for more nuanced analysis of attitudes towards vaccination. To this end, we created a new collection of over 3,101 tweets annotated with users' attitudes towards COVID-19 vaccination (stance). Besides, we also develop a domain-specific language model (VaxxBERT) that achieves the best predictive performance (73.0 accuracy and 69.3 F1-score) as compared to a robust set of baselines. To the best of our knowledge, these are the first dataset and model that model vaccine hesitancy as a category distinct from pro- and anti-vaccine stance.

翻译：Translated abstract: 疫苗犹豫不决一直是一个普遍的关注点，可能从疫苗产生以来就存在，同时随着社交媒体的普及，人们开始在线表达他们对疫苗的担忧和疑虑，旁边还有那些支持和反对疫苗的内容。可以预见的是，在首次提到COVID-19疫苗之后，社交媒体用户开始发表关于他们的恐惧和担忧，或者是对这些快速开发的疫苗的有效性的支持和信任。识别和了解公众对COVID-19疫苗的犹豫不决背后的原因对于政策制定者来说很重要，他们需要制定行动计划，以更好地向公众提供信息，以提高疫苗接种率。在COVID-19的情况下，疫苗的快速开发与反疫苗的不实信息的增长如影随形，因此需要自动检测公民对接种疫苗的态度。这是一个重要的计算社会科学任务，需要对数据进行分析，以深入了解所涉及的现象。标记的数据对于训练数据驱动的模型以更加细致的分析对待接种疫苗的态度也是必须的。为此，我们创建了一份新的数据集，其中包含了3101条推文，注明了用户对COVID-19疫苗的态度（立场）。此外，我们还开发了一个具有特定领域的语言模型（VaxxBERT），与稳健的基线相比，取得了最佳预测性能（73.0准确率和69.3 F1得分）。据我们所知，这是第一个将疫苗犹豫不决建模为与支持和反对疫苗立场不同类别的数据集和模型。