As of writing this paper, COVID-19 (Coronavirus disease 2019) has spread to more than 220 countries and territories. Following the outbreak, the pandemic's seriousness has made people more active on social media, especially on the microblogging platforms such as Twitter and Weibo. The pandemic-specific discourse has remained on-trend on these platforms for months now. Previous studies have confirmed the contributions of such socially generated conversations towards situational awareness of crisis events. The early forecasts of cases are essential to authorities to estimate the requirements of resources needed to cope with the outgrowths of the virus. Therefore, this study attempts to incorporate the public discourse in the design of forecasting models particularly targeted for the steep-hill region of an ongoing wave. We propose a sentiment-involved topic-based latent variables search methodology for designing forecasting models from publicly available Twitter conversations. As a use case, we implement the proposed methodology on Australian COVID-19 daily cases and Twitter conversations generated within the country. Experimental results: (i) show the presence of latent social media variables that Granger-cause the daily COVID-19 confirmed cases, and (ii) confirm that those variables offer additional prediction capability to forecasting models. Further, the results show that the inclusion of social media variables introduces 48.83--51.38% improvements on RMSE over the baseline models. We also release the large-scale COVID-19 specific geotagged global tweets dataset, MegaGeoCOV, to the public anticipating that the geotagged data of this scale would aid in understanding the conversational dynamics of the pandemic through other spatial and temporal contexts.
翻译:在撰写本文时,COVID-19(Corona病毒疾病,2019年)已经扩散到220多个国家和领土。疫情爆发后,疫情的严重性使人们在社交媒体的设计中更加活跃,特别是微博平台,如Twitter和Weibo。流行病特有的讨论已经持续数月,在这些平台上一直处于趋势。以前的研究证实,这种社会产生的对话有助于了解危机事件的情况。对案例的早期预测对于当局估计应对病毒爆发所需的资源至关重要。因此,本研究试图将公共讨论纳入特别针对当前浪潮的陡峭坡地区的预报模型的设计中。我们提出了一种基于情绪的、基于主题的潜在变量搜索方法,用于设计公开提供的Twitter对话的预测模型。作为一个使用的例子,我们实施了关于澳大利亚COVID-19每日案例和国内产生的Twitter对话的拟议方法。实验结果:(一) 显示潜在社会媒体变量的存在,Granger-由于每日COVI-19证实案例,以及(二) 将公众对话中的预测模型纳入。我们确认,在48-83年的基地基数据中,这些变量提供了更多的社会预测能力。