During the COVID-19 pandemic, the scientific literature related to SARS-COV-2 has been growing dramatically, both in terms of the number of publications and of its impact on people's life. This literature encompasses a varied set of sensible topics, ranging from vaccination, to protective equipment efficacy, to lockdown policy evaluation. Up to now, hundreds of thousands of papers have been uploaded on online repositories and published in scientific journals. As a result, the development of digital methods that allow an in-depth exploration of this growing literature has become a relevant issue, both to identify the topical trends of COVID-related research and to zoom-in its sub-themes. This work proposes a novel methodology, called LDA2Net, which combines topic modelling and network analysis to investigate topics under their surface. Specifically, LDA2Net exploits the frequencies of pairs of consecutive words to reconstruct the network structure of topics discussed in the Cord-19 corpus. The results suggest that the effectiveness of topic models can be magnified by enriching them with word network representations, and by using the latter to display, analyse, and explore COVID-related topics at different levels of granularity.
翻译:在COVID-19大流行期间,与SARS-COV-2有关的科学文献在出版物数量及其对人民生活的影响方面急剧增长,包括从疫苗接种到防护设备效能、锁定政策评价等一系列各种明智主题,到目前为止,已有数十万篇论文上传到在线储存库,并在科学期刊上发表,因此,开发数字方法,以便深入探讨这一不断增长的文献,已成为一个相关的问题,既要查明COVID相关研究的时下趋势,也要在其次主题中进行放大分析,这项工作提出了一种名为LDA2Net的新方法,将主题建模和网络分析结合起来,以调查其表面之下的专题。具体地说,LDA2Net利用连续几组词的频率来重建Cord-19amp中讨论的专题的网络结构。结果显示,通过用文字网络表达来充实这些专题模型,并利用后者在不同程度的颗粒上展示、分析和探索COVID相关专题,可以扩大专题模型的有效性。