We propose a new problem called coordinated topic modeling that imitates human behavior while describing a text corpus. It considers a set of well-defined topics like the axes of a semantic space with a reference representation. It then uses the axes to model a corpus for easily understandable representation. This new task helps represent a corpus more interpretably by reusing existing knowledge and benefits the corpora comparison task. We design ECTM, an embedding-based coordinated topic model that effectively uses the reference representation to capture the target corpus-specific aspects while maintaining each topic's global semantics. In ECTM, we introduce the topic- and document-level supervision with a self-training mechanism to solve the problem. Finally, extensive experiments on multiple domains show the superiority of our model over other baselines.
翻译:我们提出了一个名为协调主题的新问题,即仿照人类行为,同时描述一个文本体。它考虑了一套定义明确的专题,如带有参考说明的语义空间轴。然后它用轴来模拟一个便于理解的表达方式。这一新任务通过重新利用现有知识,有益于公司比较任务,帮助更能解释一个元素。我们设计了一个基于嵌入的、基于嵌入的协调主题模型,它有效地利用参考代表来捕捉目标物质的具体方面,同时保持每个专题的全球语义。在ECTM中,我们引入了主题和文件一级的监督,并采用了一种自我培训机制来解决问题。最后,在多个领域进行的广泛实验显示了我们模型优于其他基线。