The COVID-19 pandemic continues to bring up various topics discussed or debated on social media. In order to explore the impact of pandemics on people's lives, it is crucial to understand the public's concerns and attitudes towards pandemic-related entities (e.g., drugs, vaccines) on social media. However, models trained on existing named entity recognition (NER) or targeted sentiment analysis (TSA) datasets have limited ability to understand COVID-19-related social media texts because these datasets are not designed or annotated from a medical perspective. This paper releases METS-CoV, a dataset containing medical entities and targeted sentiments from COVID-19-related tweets. METS-CoV contains 10,000 tweets with 7 types of entities, including 4 medical entity types (Disease, Drug, Symptom, and Vaccine) and 3 general entity types (Person, Location, and Organization). To further investigate tweet users' attitudes toward specific entities, 4 types of entities (Person, Organization, Drug, and Vaccine) are selected and annotated with user sentiments, resulting in a targeted sentiment dataset with 9,101 entities (in 5,278 tweets). To the best of our knowledge, METS-CoV is the first dataset to collect medical entities and corresponding sentiments of COVID-19-related tweets. We benchmark the performance of classical machine learning models and state-of-the-art deep learning models on NER and TSA tasks with extensive experiments. Results show that the dataset has vast room for improvement for both NER and TSA tasks. METS-CoV is an important resource for developing better medical social media tools and facilitating computational social science research, especially in epidemiology. Our data, annotation guidelines, benchmark models, and source code are publicly available (https://github.com/YLab-Open/METS-CoV) to ensure reproducibility.
翻译:COVID-19大流行病继续提出在社交媒体上讨论或辩论的各种议题,为了探讨大流行病对人们生活的影响,至关重要的是要了解公众对社交媒体上与大流行病有关的实体(如药品、疫苗)的关切和态度,然而,关于现有名称实体识别(NER)或定向情绪分析(TSA)的模型培训能力有限,因为这些数据集不是从医学角度设计或附加说明的,因此无法理解与COVID-19有关的社交媒体文本。本文发布METS-COV数据集,其中包含医疗实体和与COVI-19有关的实验推文。 METS-COV包含与7类实体的10,000次推特,包括4种医疗实体类型(Disaread、Dringy、Symptom和疫苗)和3种一般实体(Person、地点和组织)。为了进一步调查推特用户对特定实体的态度,选择了4类实体(Person、组织、毒品和疫苗),并附加了用户感情说明,导致有9101个目标的情绪数据数据集(5,278)的广泛媒体改进。