Recently, the relationship between automated and human evaluation of topic models has been called into question. Method developers have staked the efficacy of new topic model variants on automated measures, and their failure to approximate human preferences places these models on uncertain ground. Moreover, existing evaluation paradigms are often divorced from real-world use. Motivated by content analysis as a dominant real-world use case for topic modeling, we analyze two related aspects of topic models that affect their effectiveness and trustworthiness in practice for that purpose: the stability of their estimates and the extent to which the model's discovered categories align with human-determined categories in the data. We find that neural topic models fare worse in both respects compared to an established classical method. We take a step toward addressing both issues in tandem by demonstrating that a straightforward ensembling method can reliably outperform the members of the ensemble.
翻译:最近,对专题模型的自动化和人文评估之间的关系受到质疑。方法开发者对关于自动化措施的新专题模型模型模型的功效和它们未能接近人类的偏好都产生了怀疑。此外,现有的评价模式往往与现实世界的运用脱钩。我们把内容分析作为主题模型的主要真实世界使用案例,因此,我们分析了专题模型的两个相关方面,这些方面影响到其为此目的的有效性和在实践中的可信赖性:其估计数的稳定性和模型发现类别与数据中人类确定的类别相协调的程度。我们发现,与既定的古典方法相比,神经主题模型在这两个方面都差得多。我们采取一个步骤,共同解决这两个问题,表明一个直接的组合方法可以可靠地超越共同体的成员。