CAVES:一套数据集,以便利对COVID疫苗的可解释分类和关注问题概述 (CAVES: A Dataset to facilitate Explainable Classification and Summarization of Concerns towards COVID Vaccines)

Convincing people to get vaccinated against COVID-19 is a key societal challenge in the present times. As a first step towards this goal, many prior works have relied on social media analysis to understand the specific concerns that people have towards these vaccines, such as potential side-effects, ineffectiveness, political factors, and so on. Though there are datasets that broadly classify social media posts into Anti-vax and Pro-Vax labels, there is no dataset (to our knowledge) that labels social media posts according to the specific anti-vaccine concerns mentioned in the posts. In this paper, we have curated CAVES, the first large-scale dataset containing about 10k COVID-19 anti-vaccine tweets labelled into various specific anti-vaccine concerns in a multi-label setting. This is also the first multi-label classification dataset that provides explanations for each of the labels. Additionally, the dataset also provides class-wise summaries of all the tweets. We also perform preliminary experiments on the dataset and show that this is a very challenging dataset for multi-label explainable classification and tweet summarization, as is evident by the moderate scores achieved by some state-of-the-art models. Our dataset and codes are available at: https://github.com/sohampoddar26/caves-data

翻译：说服人们接种COVID-19疫苗是当今社会面临的一个关键挑战。作为实现这一目标的第一步,许多先前的工作都依靠社交媒体分析来理解人们对这些疫苗的具体关切,例如潜在副作用、无效、政治因素等等。尽管有将社交媒体职位广泛分类为反Vax和Pro-Vax标签的数据集,但是没有(我们的知识)根据文章中提到的具体的反疫苗关注点给社交媒体文章贴上标签的数据集(我们的知识)。在本文中,我们已经整理了CAVES,这是第一个包含大约10k COVID-19抗疫苗推特的大型数据集,在多标签设置中被贴进了各种具体的防疫苗关切。这也是第一个多标签分类分类数据集,为每个标签提供了解释。此外,数据集还提供了所有推文的分类摘要。我们还在数据集上进行了初步实验,并显示这是一个非常具有挑战性的多标签/卡可解释的模型。我们在多标签分类和推特总数据中已经实现的数据。我们通过州化实现的分类和TLARC数据。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日