COVID-19 pandemic has made tremendous impact on the whole world, both the real world and the media atmosphere. Our research conducted a text analysis using LDA topic model. We first scraped 1127 articles and 5563 comments on SCMP covering COVID-19 from Jan 20 to May 19, then we trained the LDA model and tuned parameters based on the $C_v$ coherence as the model evaluation method. With the optimal model, dominant topics, representative documents of each topic and the inconsistency between articles and comments are analyzed. Some factors of the inconsistency are discussed at last.
翻译:我们的研究利用LDA专题模型进行了文本分析,我们首先从1月20日至5月19日对涉及COVID-19的SCMP的1127篇文章和5563条评论进行了剪辑,然后我们根据美元和五美元的连贯性对LDA模型和调制参数进行了培训,作为评价模式,分析了最佳模型、主要专题、每个专题的代表性文件以及文章和评论之间的不一致之处,最后讨论了不一致的一些因素。