基于网络的COVID-19生物医学文献文本开采轨迹主题互动图 (Network-based Trajectory Topic Interaction Map for Text Mining of COVID-19 Biomedical Literature)

Since the emergence of the worldwide pandemic of COVID-19, relevant research has been published at a dazzling pace, which makes it hard to follow the research in this area without dedicated efforts. It is practically impossible to implement this task manually due to the high volume of the relevant literature. Text mining has been considered to be a powerful approach to address this challenge, especially the topic modeling, a well-known unsupervised method that aims to reveal latent topics from the literature. However, in spite of its potential utility, the results generated from this approach are often investigated manually. Hence, its application to the COVID-19 literature is not straightforward and expert knowledge is needed to make meaningful interpretations. In order to address these challenges, we propose a novel analytical framework for estimating topic interactions and effective visualization for topic interpretation. Here we assumed that topics constituting a paper can be positioned on an interaction map, which belongs to a high-dimensional Euclidean space. Based on this assumption, after summarizing topics with their topic-word distributions using the biterm topic model, we mapped these latent topics on networks to visualize relationships among the topics. Moreover, in the proposed approach, the change of relationships among topics can be traced using a trajectory plot generated with different levels of word richness. These results together provide deeply mined and intuitive representation of relationships among topics related to a specific research area. The application of this proposed framework to the PubMed literature shows that our approach facilitates understanding of the topics constituting the COVID-19 knowledge.

翻译：自世界范围内的COVID-19大流行以来,相关研究以惊人的速度出版,因此很难在不专门努力的情况下跟踪这一领域的研究,由于相关文献数量庞大,几乎不可能手工执行这项任务; 文本采矿被认为是应对这一挑战的有力方法,特别是专题建模,这是众所周知的、以揭示文献中潜在专题为目的的无人监督的方法,尽管它具有潜在效用,但这一方法所产生的结果经常是人工调查的。因此,它适用于COVID-19文献并非直截了当的,需要专门知识来进行有意义的解释。为了应对这些挑战,我们提出了一个新的分析框架,用于估计专题互动和专题解释的有效可视化。我们在这里假定,构成文件的专题可以放在互动图上,该图属于高层次的Euclidean空间。基于这一假设,在利用双轨专题框架对主题的传播进行总结后,我们将这些潜在专题放在网络上,使各种专题之间的关系具有可视化作用。此外,在拟议的专题图示方面,在构建模型方面,这些动态关系中,可追溯到与构建专题的深度关系。