COVID-19前后和期间欧洲团结的变化:来自大人群和专家加注的Twitter数据集的证据 (Changes in European Solidarity Before and During COVID-19: Evidence from a Large Crowd- and Expert-Annotated Twitter Dataset)

We introduce the well-established social scientific concept of social solidarity and its contestation, anti-solidarity, as a new problem setting to supervised machine learning in NLP to assess how European solidarity discourses changed before and after the COVID-19 outbreak was declared a global pandemic. To this end, we annotate 2.3k English and German tweets for (anti-)solidarity expressions, utilizing multiple human annotators and two annotation approaches (experts vs.\ crowds). We use these annotations to train a BERT model with multiple data augmentation strategies. Our augmented BERT model that combines both expert and crowd annotations outperforms the baseline BERT classifier trained with expert annotations only by over 25 points, from 58\% macro-F1 to almost 85\%. We use this high-quality model to automatically label over 270k tweets between September 2019 and December 2020. We then assess the automatically labeled data for how statements related to European (anti-)solidarity discourses developed over time and in relation to one another, before and during the COVID-19 crisis. Our results show that solidarity became increasingly salient and contested during the crisis. While the number of solidarity tweets remained on a higher level and dominated the discourse in the scrutinized time frame, anti-solidarity tweets initially spiked, then decreased to (almost) pre-COVID-19 values before rising to a stable higher level until the end of 2020.

翻译：我们引入社会团结及其质疑、反团结的既定社会科学概念,认为这是一个新问题,需要监管国家劳工局的机器学习,以评估在COVID-19爆发被宣布为全球流行病之前和之后欧洲团结言论如何发生变化;为此,我们用多种人类批注器和两种批注方法(专家对人群的批注),将社会团结及其争议性、反团结性等公认的社会科学概念作为新问题,作为国家劳工局监督的机器学习新问题,以评估欧洲团结性言论如何在被宣布为全球流行病COVID-19爆发之前和之后的变化;为此,我们用2.3k英文和德文推文来说明(反人类团结性言论),我们利用这些说明来培训一个具有多重数据增强战略的BERT模型;我们扩大的BERT模型,将专家和众种批注结合起来,使仅经过专家说明培训的基线BERT分类器在25个百分点以上(从58 ⁇ 宏观-F1至近85 ⁇ )之间出现变化;我们使用这一高质量模型自动标注了270k的推文,在2019危机之前,我们接着评估了与欧洲(反团结性言论的一段、反维-19危机之前和反团结性言论发生前和反团结性言论持续升级的层次的团结程度。