Background. After a year and half and over 4 million deaths, the COVID-19 pandemic continues to be widespread, and its related topics continue to dominate the global media. Although COVID-19 diagnoses have been well monitored, neither the impacts of the disease on human behavior and social dynamics nor the effectiveness of policy interventions aimed at its containment are fully understood. Monitoring the spatial and temporal patterns of behavior, social dynamics and policy - and then their interrelations - can provide critical information for preparatory action and effective response. Methods. Here we present an open-source dataset of 1.92 million keyword-selected Twitter posts, updated weekly from January 2020 to present, along with a dynamic dashboard showing totals at national and subnational administrative divisions. Results. The dashboard presents 100% of the geotagged tweets that contain keywords or hashtags related COVID-19. We validated our inclusion criteria using a machine learning-based text classifier and found that 88% of the selected tweets were correctly labeled as related to COVID-19. With this information we tested the correlation between tweets and covid diagnosis from January 1, 2020 through December 31, 2020 and see a decreasing correlation across time. Conclusions. With emerging COVID variants but ongoing vaccine hesitancy and resistance, this dataset could be used by researchers to study numerous aspects of COVID-19 and provide valuable insights for preparing future pandemics.
翻译:在经历了一年半和四百多万人的死亡之后,COVID-19大流行病仍然广泛存在,其相关专题继续在全球媒体中占主导地位。虽然COVID-19的诊断得到了很好的监测,但该疾病对人类行为和社会动态的影响以及旨在遏制该流行病的政策干预措施的有效性都没有得到充分的了解。监测行为的空间和时间模式、社会动态和政策,以及随后的相互关系,可以为准备行动和有效应对提供关键信息。方法。我们在这里提供了一套公开源数据集,从2020年1月到现在每周更新的192万个关键词选择的Twitter文章,以及一个动态仪表板,显示国家和国家以下行政司的总数。结果:该仪表板展示了100%的Geooted twitter,其中含有与COVID-19相关的关键词或标签。我们用一个基于机器的文本解析器验证了我们的包容性标准,发现88%的选定推特被正确标为与COVID-19有关。我们测试了2020年1月1日至2020年12月31日的推文与COVI诊断的相关性,但从2020年12月31日至2010年D年12月31日期间,我们测试了这些信息,并看到一个动态仪表仪表显示了国家和国行政司之间的相关性。结果。结果显示,100%地标标标点显示了100%的贴关系。我们使用了CVI正在不断更新的数据和不断更新的数据。通过使用中的数据。