Topic Modelling (TM) is from the research branches of natural language understanding (NLU) and natural language processing (NLP) that is to facilitate insightful analysis from large documents and datasets, such as a summarisation of main topics and the topic changes. This kind of discovery is getting more popular in real-life applications due to its impact on big data analytics. In this study, from the social-media and healthcare domain, we apply popular Latent Dirichlet Allocation (LDA) methods to model the topic changes in Swedish newspaper articles about Coronavirus. We describe the corpus we created including 6515 articles, methods applied, and statistics on topic changes over approximately 1 year and two months period of time from 17th January 2020 to 13th March 2021. We hope this work can be an asset for grounding applications of topic modelling and can be inspiring for similar case studies in an era with pandemics, to support socio-economic impact research as well as clinical and healthcare analytics. Our data and source code are openly available at https://github. com/poethan/Swed_Covid_TM Keywords: Latent Dirichlet Allocation (LDA); Topic Modelling; Coronavirus; Pandemics; Natural Language Understanding
翻译:模型(TM)来自自然语言理解(NLU)和自然语言处理(NLP)的研究分支,目的是便利对大型文件和数据集进行有见地的分析,例如对主要专题和主题变化的总结。这种发现由于对大数据分析的影响,在现实生活中的应用中越来越受欢迎。在本研究中,从社会媒体和保健领域,我们采用流行的Lientt Dirichlet分配(LDA)方法来模拟瑞典报纸关于科罗纳病毒的文章中的主题变化。我们描述了我们创建的系统,包括从2020年1月17日至2021年3月13日大约1年零2个月的时间里对大型文件和数据集进行有见地的分析,包括6515篇文章、应用的方法和关于主题变化的统计数据。我们希望这项工作能够成为专题模型应用的基础,并且能够激励在流行病流行时代进行类似的案例研究,以支持社会经济影响研究以及临床和保健分析。我们的数据和源代码在https://github.com/poethan/Swed_Covid_Covid_TainAligal Read;Driental Translation;Dlish;Drient Alistration;Driental