Network-based procedures for topic detection in huge text collections offer an intuitive alternative to probabilistic topic models. We present in detail a method that is especially designed with the requirements of domain experts in mind. Like similar methods, it employs community detection in term co-occurrence graphs, but it is enhanced by including a resolution parameter that can be used for changing the targeted topic granularity. We also establish a term ranking and use semantic word-embedding for presenting term communities in a way that facilitates their interpretation. We demonstrate the application of our method with a widely used corpus of general news articles and show the results of detailed social-sciences expert evaluations of detected topics at various resolutions. A comparison with topics detected by Latent Dirichlet Allocation is also included. Finally, we discuss factors that influence topic interpretation.
翻译:在庞大的文本集中,基于网络的专题探测程序为概率性专题模型提供了直观的替代方法。我们详细介绍了一种特别设计的方法,其设计符合领域专家的要求。与类似方法一样,它采用社区探测方法,即共同发生图,但通过纳入一个可用于改变目标专题颗粒的分辨率参数而得到加强。我们还为以方便解释的方式提出术语社区,确定了术语的排序和使用语义拼写方式。我们用广泛使用的一般新闻文章展示了我们的方法的应用,并展示了对各项决议中发现的专题进行详细社会科学专家评价的结果。还纳入了与Lenttant Dirichlet分配所发现的专题的比较。最后,我们讨论了影响专题解释的因素。