Topic Modelling (TM) is from the research branches of natural language understanding (NLU) and natural language processing (NLP) that is to facilitate insightful analysis from large documents and datasets, such as a summarisation of main topics and the topic changes. This kind of discovery is getting more popular in real-life applications due to its impact on big data analytics. In this study, from the social-media and healthcare domain, we apply popular Latent Dirichlet Allocation (LDA) methods to model the topic changes in Swedish newspaper articles about Coronavirus. We describe the corpus we created including 6515 articles, methods applied, and statistics on topic changes over approximately 1 year and two months period of time from 17th January 2020 to 13th March 2021. We hope this work can be an asset for grounding applications of topic modelling and can be inspiring for similar case studies in an era with pandemics, to support socio-economic impact research as well as clinical and healthcare analytics. Our data and source code are openly available at https://github. com/poethan/Swed_Covid_TM Keywords: Latent Dirichlet Allocation (LDA); Topic Modelling; Coronavirus; Pandemics; Natural Language Understanding; BERT-topic
翻译:模型(TM)来自自然语言理解(NLU)和自然语言处理(NLP)研究分支的自然语言理解(TM)和自然语言处理(NLP)的研究分支,目的是便利对大型文件和数据集进行有见地的分析,例如对主要专题和主题变化的总结。这种发现由于对大数据分析的影响,在现实生活中的应用越来越受欢迎。在本研究中,从社会媒体和保健领域,我们采用流行的Lient Dirichlet 分配(LDA)方法来模拟瑞典报纸文章关于科罗纳病毒的题目变化。我们描述了我们创建的系统,包括从2020年1月17日至2021年3月13日大约一年两个月的时间内的6515篇文章、应用的方法和主题变化统计数据。我们希望,这项工作能够成为专题模型应用的基础,并且能够激励在流行病流行时代的类似案例研究,以支持社会经济影响研究以及临床和保健分析。我们的数据和源代码在https://github.com/poethan/Swed_Covid_Covid_TainAlishal Developal:Drialstalstal-DRAVALD);我们的数据和源代码可以公开查阅的DRate-Dirstitutyaldald;D;Dirst;D;Dirst