COVID-19生物医学文献文本开采轨迹可视化 (Graph-based Trajectory Visualization for Text Mining of COVID-19 Biomedical Literature)

Since the emergence of the worldwide pandemic of COVID-19, relevant research has been published at a dazzling pace, which makes it hard to follow the research in this area without dedicated efforts. It is practically impossible to implement this task manually due to the high volume of the relevant literature. Text mining has been considered to be a powerful approach to address this challenge, especially the topic modeling, a well-known unsupervised method that aims to reveal latent topics from the literature. However, in spite of its potential utility, the results generated from this approach are often investigated manually. Hence, its application to the COVID-19 literature is not straightforward and expert knowledge is needed to make meaningful interpretations. In order to address these challenges, we propose a novel analytical framework for effective visualization and mining of topic modeling results. Here we assumed that topics constituting a paper can be positioned on an interaction map, which belongs to a high-dimensional Euclidean space. Based on this assumption, after summarizing topics with their topic-word distributions using the biterm topic model, we mapped these latent topics on networks to visualize relationships among the topics. Moreover, in the proposed approach, the change of relationships among topics can be traced using a trajectory plot generated with different levels of word richness. These results together provide a deeply mined and intuitive representation of relationships among topics related to a specific research area. The application of this proposed framework to the PubMed literature shows that our approach facilitates understanding of the topics constituting the COVID-19 knowledge.

翻译：自世界范围内的COVID-19流行病出现以来,相关研究以惊人的速度出版,因此很难在不专门努力的情况下跟踪这一领域的研究,由于相关文献数量庞大,几乎不可能手工执行这项任务; 文本采矿被认为是应对这一挑战的有力方法,特别是专题建模,这是一个众所周知的无人监督的方法,目的是揭示文献中的隐性专题,然而,尽管这种方法可能有用,但其结果经常是人工调查的。因此,对COVID-19文献的应用并不是直截了当的,需要专业知识来进行有意义的解释。为了应对这些挑战,我们提出了一个关于专题建模成果的有效可视化和挖掘的新的分析框架。我们在这里假定,构成文件的专题可以放在互动图上,该图属于高层次的Eucliidean空间。基于这一假设,在用双轨专题模型对专题的分发进行总结之后,我们将这些潜在专题标本放在网络上,使各专题之间的关系具有可视性。此外,为了应对这些挑战,我们提出了一个新的分析性分析框架,在构建专题图案的模型上展示了一种动态关系。