Convincing people to get vaccinated against COVID-19 is a key societal challenge in the present times. As a first step towards this goal, many prior works have relied on social media analysis to understand the specific concerns that people have towards these vaccines, such as potential side-effects, ineffectiveness, political factors, and so on. Though there are datasets that broadly classify social media posts into Anti-vax and Pro-Vax labels, there is no dataset (to our knowledge) that labels social media posts according to the specific anti-vaccine concerns mentioned in the posts. In this paper, we have curated CAVES, the first large-scale dataset containing about 10k COVID-19 anti-vaccine tweets labelled into various specific anti-vaccine concerns in a multi-label setting. This is also the first multi-label classification dataset that provides explanations for each of the labels. Additionally, the dataset also provides class-wise summaries of all the tweets. We also perform preliminary experiments on the dataset and show that this is a very challenging dataset for multi-label explainable classification and tweet summarization, as is evident by the moderate scores achieved by some state-of-the-art models. Our dataset and codes are available at: https://github.com/sohampoddar26/caves-data
翻译:说服人们接种COVID-19疫苗是当今社会面临的一个关键挑战。作为实现这一目标的第一步,许多先前的工作都依靠社交媒体分析来理解人们对这些疫苗的具体关切,例如潜在副作用、无效、政治因素等等。尽管有将社交媒体职位广泛分类为反Vax和Pro-Vax标签的数据集,但是没有(我们的知识)根据文章中提到的具体的反疫苗关注点给社交媒体文章贴上标签的数据集(我们的知识)。在本文中,我们已经整理了CAVES,这是第一个包含大约10k COVID-19抗疫苗推特的大型数据集,在多标签设置中被贴进了各种具体的防疫苗关切。这也是第一个多标签分类分类数据集,为每个标签提供了解释。此外,数据集还提供了所有推文的分类摘要。我们还在数据集上进行了初步实验,并显示这是一个非常具有挑战性的多标签/卡可解释的模型。我们在多标签分类和推特总数据中已经实现的数据。我们通过州化实现的分类和TLARC数据。