Embedded topic models are able to learn interpretable topics even with large and heavy-tailed vocabularies. However, they generally hold the Euclidean embedding space assumption, leading to a basic limitation in capturing hierarchical relations. To this end, we present a novel framework that introduces hyperbolic embeddings to represent words and topics. With the tree-likeness property of hyperbolic space, the underlying semantic hierarchy among words and topics can be better exploited to mine more interpretable topics. Furthermore, due to the superiority of hyperbolic geometry in representing hierarchical data, tree-structure knowledge can also be naturally injected to guide the learning of a topic hierarchy. Therefore, we further develop a regularization term based on the idea of contrastive learning to inject prior structural knowledge efficiently. Experiments on both topic taxonomy discovery and document representation demonstrate that the proposed framework achieves improved performance against existing embedded topic models.
翻译:嵌入式专题模型能够学习可解释的专题,即使有大量和繁琐的词汇,它们也能学习可解释的专题,但是,它们一般都持有欧几里德嵌入的空间假设,从而在捕捉等级关系方面造成基本限制。为此,我们提出了一个新颖的框架,引入双曲嵌入来代表文字和专题。由于双曲空间的树类属性属性,对文字和专题的基本语义等级和专题的描述可以更好地加以利用,以探测更易解释的专题。此外,由于超单曲几何制在代表等级数据方面具有优越性,因此,树木结构知识也可以自然注入来指导主题等级的学习。因此,我们进一步根据对比性学习的想法,开发一个正规化的术语,以便有效地注入先前的结构知识。关于专题分类学发现和文件表述的实验表明,拟议的框架比现有的嵌入式专题模型取得了更好的业绩。