Topic models are widely used analysis techniques for clustering documents and surfacing thematic elements of text corpora. These models remain challenging to optimize and often require a "human-in-the-loop" approach where domain experts use their knowledge to steer and adjust. However, the fragility, incompleteness, and opacity of these models means even minor changes could induce large and potentially undesirable changes in resulting model. In this paper we conduct a simulation-based analysis of human-centered interactions with topic models, with the objective of measuring the sensitivity of topic models to common classes of user actions. We find that user interactions have impacts that differ in magnitude but often negatively affect the quality of the resulting modelling in a way that can be difficult for the user to evaluate. We suggest the incorporation of sensitivity and "multiverse" analyses to topic model interfaces to surface and overcome these deficiencies.
翻译:专题模型是广泛使用的分析技术,用于对文件进行分组和对文本公司的专题要素进行表面分析,这些模型仍然难以优化,往往需要采用“流动中人”方法,由领域专家利用知识来指导和调整;然而,这些模型的脆弱性、不完善性和不透明性意味着,即使是微小的改变也会在形成模型时引起巨大和潜在不可取的变化;在本文件中,我们对与专题模型的以人为中心的互动进行模拟分析,目的是衡量专题模型对共同用户行动类别的敏感度;我们发现,用户的相互作用具有不同程度的影响,但往往对生成的模型的质量产生消极影响,用户难以评估;我们建议将敏感性和“多元”分析纳入专题模型界面,以便表面和克服这些缺陷。