Since the beginning of coronavirus, the disease has spread worldwide and drastically changed many aspects of the human's lifestyle. Twitter as a powerful tool can help researchers measure public health in response to COVID-19. According to the high volume of data production on social networks, automated text mining approaches can help search, read and summarize helpful information. This paper preprocessed the existing medical dataset regarding COVID-19 named CORD-19 and annotated the dataset for supervised classification tasks. At this time of the COVID-19 pandemic, we made a preprocessed dataset for the research community. This may contribute towards finding new solutions for some social interventions that COVID-19 has made. The preprocessed version of the mentioned dataset is publicly available through Github.
翻译:自科罗纳病毒开始以来,该疾病已在全世界蔓延,并急剧改变了人类生活方式的许多方面。Twitter作为一个强有力的工具,可以帮助研究人员测量公共卫生情况,以回应COVID-19。根据社交网络上的大量数据生产,自动文本采矿方法可以帮助搜索、阅读和总结有用信息。本文件预先处理了关于COVID-19的现有医疗数据集,名为CORD-19,并附加了监督分类任务的附加说明。此时,在COVID-19流行病流行期间,我们为研究界编制了一套预处理的数据集。这可能有助于为COVID-19采取的一些社会干预措施找到新的解决办法。上述数据集的预处理版本通过Github公开提供。