In 2020, COVID-19 became the chief concern of the world and is still reflected widely in all social networks. Each day, users post millions of tweets and comments on this subject, which contain significant implicit information about the public opinion. In this regard, a dataset of COVID-related tweets in English language is collected, which consists of more than two million tweets from March 23 to June 23 of 2020 to extract the feelings of the people in various countries in the early stages of this outbreak. To this end, first, we use a lexicon-based approach in conjunction with the GeoNames geographic database to label the tweets with their locations. Next, a method based on the recently introduced and widely cited RoBERTa model is proposed to analyze their sentimental content. After that, the trend graphs of the frequency of tweets as well as sentiments are produced for the world and the nations that were more engaged with COVID-19. Graph analysis shows that the frequency graphs of the tweets for the majority of nations are significantly correlated with the official statistics of the daily afflicted in them. Moreover, several implicit knowledge is extracted and discussed.
翻译:2020年3月23日至6月23日,COVID-19成为世界关注的主要问题,并广泛反映在所有社交网络中。每天,用户发布数以百万计的推文和关于这个主题的评论,其中含有大量关于公众舆论的隐含信息。在这方面,以英语收集了与COVID有关的推文数据集,该数据集由2020年3月23日至6月23日的200多万次推文组成,以吸引各国人民在爆发初期的感情。为此目的,首先,我们与GeoNames地理数据库一起使用基于词汇的方法将这些推文贴上其位置的标签。随后,提议采用基于最近推出和广泛引用的RobERTa模式的方法分析其情感内容。此后,为全世界和更多参与COVID-19的国家制作了推文的频率趋势图。图表分析显示,大多数国家的推文频率图与它们每日的官方统计数据有很大关联。此外,还提取和讨论了若干隐含知识。