We propose a multi-scale hybridized topic modeling method to find hidden topics from transcribed interviews more accurately and efficiently than traditional topic modeling methods. Our multi-scale hybridized topic modeling method (MSHTM) approaches data at different scales and performs topic modeling in a hierarchical way utilizing first a classical method, Nonnegative Matrix Factorization, and then a transformer-based method, BERTopic. It harnesses the strengths of both NMF and BERTopic. Our method can help researchers and the public better extract and interpret the interview information. Additionally, it provides insights for new indexing systems based on the topic level. We then deploy our method on real-world interview transcripts and find promising results.
翻译:我们建议一种多尺度混合专题模型方法,以便比传统的专题模型方法更准确、更高效地从转录访谈中找到隐藏的专题。 我们的多尺度混合专题模型方法在不同尺度上处理数据,并以等级方式进行专题模型,首先使用经典方法,非负矩阵集成,然后采用以变压器为基础的方法,即BERTopic。它利用了NMF和BERTopic的优势。我们的方法可以帮助研究人员和公众更好地提取和解释访谈信息。此外,它为基于主题水平的新索引系统提供了深刻的见解。然后,我们运用了我们的方法,在现实世界的访谈记录中找到有希望的结果。