Customers' reviews and comments are important for businesses to understand users' sentiment about the products and services. However, this data needs to be analyzed to assess the sentiment associated with topics/aspects to provide efficient customer assistance. LDA and LSA fail to capture the semantic relationship and are not specific to any domain. In this study, we evaluate BERTopic, a novel method that generates topics using sentence embeddings on Consumer Financial Protection Bureau (CFPB) data. Our work shows that BERTopic is flexible and yet provides meaningful and diverse topics compared to LDA and LSA. Furthermore, domain-specific pre-trained embeddings (FinBERT) yield even better topics. We evaluated the topics on coherence score (c_v) and UMass.
翻译:客户的审查和评论对于企业了解用户对产品和服务的看法十分重要,然而,这些数据需要加以分析,以评估与专题/目的相关的情绪,以便提供高效率的客户援助。LDA和LSA未能捕捉语义关系,也不是任何领域特有的关系。在本研究中,我们评估BERTopic,这是利用消费者金融保护局(CFPB)数据嵌入的句子生成专题的新方法。我们的工作表明BERTopic是灵活的,但提供了与LDA和LSA相比有意义和多样的专题。此外,具体领域预先培训的嵌入(FinBERT)产生了更好的专题。我们评估了一致性评分(c_v)和UMass这两个专题。