Multilingual topic models enable crosslingual tasks by extracting consistent topics from multilingual corpora. Most models require parallel or comparable training corpora, which limits their ability to generalize. In this paper, we first demystify the knowledge transfer mechanism behind multilingual topic models by defining an alternative but equivalent formulation. Based on this analysis, we then relax the assumption of training data required by most existing models, creating a model that only requires a dictionary for training. Experiments show that our new method effectively learns coherent multilingual topics from partially and fully incomparable corpora with limited amounts of dictionary resources.
翻译:多语文专题模式通过从多语种公司中提取一致的专题,使跨语言的任务得以实现。大多数模式需要平行或可比的培训公司,这限制了它们进行概括的能力。在本文件中,我们首先通过界定替代但等效的表述方式,去掉多语种专题模式背后的知识转让机制的神秘性。然后,根据这一分析,我们放松对大多数现有模式所要求的培训数据的假设,建立一个只要求培训词典的模型。实验表明,我们的新方法有效地从部分和完全无可比性的、数量有限的词典资源组成的公司中学习了连贯的多语种专题。