Topic modeling is a state-of-the-art technique for analyzing text corpora. It uses a statistical model, most commonly Latent Dirichlet Allocation (LDA), to discover abstract topics that occur in the document collection. However, the LDA-based topic modeling procedure is based on a randomly selected initial configuration as well as a number of parameter values than need to be chosen. This induces uncertainties on the topic modeling results, and visualization methods should convey these uncertainties during the analysis process. We propose a visual uncertainty-aware topic modeling analysis. We capture the uncertainty by computing topic modeling ensembles and propose measures for estimating topic modeling uncertainty from the ensemble. Then, we propose to enhance state-of-the-art topic modeling visualization methods to convey the uncertainty in the topic modeling process. We visualize the entire ensemble of topic modeling results at different levels for topic and document analysis. We apply our visualization methods to a text corpus to document the impact of uncertainty on the analysis.
翻译:专题建模是分析文本公司的一种最新技术,它使用一种统计模型,最常见的是Lentant Dirichlet分配(LDA),以发现文件收藏中的抽象专题。然而,基于LDA的专题建模程序是基于随机选择的初步配置以及一些比需要选择的参数值。这在专题建模结果上产生了不确定性,可视化方法应在分析过程中传达这些不确定性。我们提议了一种直观的不确定性专题建模分析。我们通过计算专题建模聚合来捕捉不确定性,并提议了用以估计从共论中选取的不确定性的专题建模的措施。然后,我们提议加强最先进的专题建模方法,以传达专题建模过程中的不确定性。我们在专题建模和文件分析中可视化了不同层次的全部专题建模结果。我们用可视化方法在文本集中记录不确定性对分析的影响。