Topic modeling is an unsupervised method for revealing the hidden semantic structure of a corpus. It has been increasingly widely adopted as a tool in the social sciences, including political science, digital humanities and sociological research in general. One desirable property of topic models is to allow users to find topics describing a specific aspect of the corpus. A possible solution is to incorporate domain-specific knowledge into topic modeling, but this requires a specification from domain experts. We propose a novel query-driven topic model that allows users to specify a simple query in words or phrases and return query-related topics, thus avoiding tedious work from domain experts. Our proposed approach is particularly attractive when the user-specified query has a low occurrence in a text corpus, making it difficult for traditional topic models built on word cooccurrence patterns to identify relevant topics. Experimental results demonstrate the effectiveness of our model in comparison with both classical topic models and neural topic models.
翻译:专题模型是一种未受监督的揭示物质隐藏的语义结构的方法,在社会科学,包括政治学、数字人文学和一般的社会学研究中日益被广泛采用为工具,专题模型的一个可取属性是让用户找到描述物质具体方面的专题;一个可能的解决办法是将特定领域的知识纳入专题模型,但需要由域专家作出具体说明。我们提出了一个由查询驱动的新式专题模型,使用户能够用文字或词句指定一个简单的查询,并返回与查询有关的专题,从而避免域专家的烦琐工作。当用户指定查询在文本中出现少发现象时,我们提议的方法特别有吸引力,使得以词重复模式建立的传统专题模型难以确定相关专题。实验结果表明我们的模式与经典专题模型和神经专题模型相比的有效性。