基于网络的COVID-19生物医学文献文本开采主题互动地图轨迹分析 (Network-based Trajectory Analysis of Topic Interaction Map for Text Mining of COVID-19 Biomedical Literature)

Since the emergence of the worldwide pandemic of COVID-19, relevant research has been published at a dazzling pace, which makes it hard to follow the research in this area without dedicated efforts. It is practically impossible to implement this task manually due to the high volume of the relevant literature. Text mining has been considered to be a powerful approach to address this challenge, especially the topic modeling, a well-known unsupervised method that aims to reveal latent topics from the literature. However, in spite of its potential utility, the results generated from this approach are often investigated manually. Hence, its application to the COVID-19 literature is not straightforward and expert knowledge is needed to make meaningful interpretations. In order to address these challenges, we propose a novel analytical framework for estimating topic interactions and effective visualization for topic interpretation. Here we assumed that topics constituting a paper can be positioned on an interaction map, which belongs to a high-dimensional Euclidean space. Based on this assumption, after summarizing topics with their topic-word distributions using the biterm topic model, we mapped these latent topics on networks to visualize relationships among the topics. Moreover, in the proposed approach, we developed a score that is helpful to select meaningful words that characterize the topic. We interpret the relationships among topics by tracking the change of relationships among topics using a trajectory plot generated with different levels of word richness. These results together provide deeply mined and intuitive representation of relationships among topics related to a specific research area. The application of this proposed framework to the PubMed literature shows that our approach facilitates understanding of the topics constituting the COVID-19 knowledge.

翻译：自世界范围内的COVID-19大流行出现以来,相关研究以惊人的速度出版,因此很难在不专门努力的情况下跟踪这一领域的研究,由于相关文献数量庞大,几乎不可能手工执行这项任务; 文本采矿被认为是应对这一挑战的有力方法,特别是专题建模,这是众所周知的、不为人知的、旨在揭示文献中潜在专题的一种不受监督的方法; 然而,尽管这种方法可能有用,但其结果经常是人工调查的。因此,对COVID-19文献的应用并不是直截了当的,需要专业知识来进行有意义的解释。为了应对这些挑战,我们提出了一个新的分析框架,用于估计专题互动和专题解释的有效可视化。我们在这里假定,构成文件的专题可以放在互动图上,该图是一个高层次的Euclidean CO空间。以这一假设为基础,在用双轨专题模型总结主题的分布之后,我们用这些潜在主题在网络上绘制了这些潜在专题的题目,以便把专题之间的关系直观化。此外,为了应对这些挑战,我们提出了一个新的分析性分析框架,我们用一个富有的轨道来分析,我们所形成的专题的顺序上,我们用一个有用的术语来评估了一个不同的专题的顺序,从而将一个对一个有用的术语进行分化。