As of writing this paper, COVID-19 (Coronavirus disease 2019) has spread to more than 220 countries and territories. Following the outbreak, the pandemic's seriousness has made people more active on social media, especially on the microblogging platforms such as Twitter and Weibo. The pandemic-specific discourse has remained on-trend on these platforms for months now. Previous studies have confirmed the contributions of such socially generated conversations towards situational awareness of crisis events. The early forecasts of cases are essential to authorities to estimate the requirements of resources needed to cope with the outgrowths of the virus. Therefore, this study attempts to incorporate the public discourse in the design of forecasting models particularly targeted for the steep-hill region of an ongoing wave. We propose a sentiment-involved topic-based methodology for designing multiple time series from publicly available COVID-19 related Twitter conversations. As a use case, we implement the proposed methodology on Australian COVID-19 daily cases and Twitter conversations generated within the country. Experimental results: (i) show the presence of latent social media variables that Granger-cause the daily COVID-19 confirmed cases, and (ii) confirm that those variables offer additional prediction capability to forecasting models. Further, the results show that the inclusion of social media variables for modeling introduces 48.83--51.38% improvements on RMSE over the baseline models. We also release the large-scale COVID-19 specific geotagged global tweets dataset, MegaGeoCOV, to the public anticipating that the geotagged data of this scale would aid in understanding the conversational dynamics of the pandemic through other spatial and temporal contexts.
翻译:在撰写本文时,COVID-19(Corona病毒疾病,2019年)已经扩散到220多个国家和领土。因此,在疫情爆发后,该流行病的严重性使人们在社交媒体的设计中更加活跃,特别是在微博平台上,特别是Twitter和Weibo等微博平台上更加活跃。关于大流行病的专题讨论已经持续数月,在这些平台上仍然处于趋势状态。以前的研究证实了这种社会产生的对话对了解危机事件局势的贡献。对案例的早期预测对于当局估计应对病毒蔓延所需的资源至关重要。因此,本研究试图将公共讨论纳入特别针对当前浪潮中陡峭壁地区的预测模型的设计中。我们提出了一种基于情绪的基于主题的系列方法,从公开提供的COVID-19相关Twitter对话中设计多个时间序列。我们用实例是,对澳大利亚COVID-19每日案例和国内生成的推特对话的拟议方法。实验结果:(一)通过Granger-因为每日COVI-19-19模型证实了案例,这一空间空间动态的预测模型规模,以及(二)为不断增长的Geural-Galalalalalalalalalalalalalalalal 数据预测提供了更多数据模型。这些变量的预测,这些变量在媒体模型上的预测中提供这些变量中提供了更多的预测。