In this paper, we provide an overview of the WNUT-2020 shared task on the identification of informative COVID-19 English Tweets. We describe how we construct a corpus of 10K Tweets and organize the development and evaluation phases for this task. In addition, we also present a brief summary of results obtained from the final system evaluation submissions of 55 teams, finding that (i) many systems obtain very high performance, up to 0.91 F1 score, (ii) the majority of the submissions achieve substantially higher results than the baseline fastText (Joulin et al., 2017), and (iii) fine-tuning pre-trained language models on relevant language data followed by supervised training performs well in this task.
翻译:本文概述了WNUT-2020在识别信息丰富的COVID-19英文Tweets方面共同承担的任务,描述了我们如何为这项任务建立10K Tweets体系,并组织开发和评估阶段,此外,我们还简要介绍了55个小组提交的最后系统评价报告的结果,发现(一) 许多系统业绩很高,达0.91 F1分,(二) 大多数提交材料比基准快图取得高得多的结果(Joulin等人,2017年),(三) 对有关语言数据的预先培训语言模型进行微调,随后在监督下开展培训,在这项工作中表现良好。