User-generated content from social media is produced in many languages, making it technically challenging to compare the discussed themes from one domain across different cultures and regions. It is relevant for domains in a globalized world, such as market research, where people from two nations and markets might have different requirements for a product. We propose a simple, modern, and effective method for building a single topic model with sentiment analysis capable of covering multiple languages simultanteously, based on a pre-trained state-of-the-art deep neural network for natural language understanding. To demonstrate its feasibility, we apply the model to newspaper articles and user comments of a specific domain, i.e., organic food products and related consumption behavior. The themes match across languages. Additionally, we obtain an high proportion of stable and domain-relevant topics, a meaningful relation between topics and their respective textual contents, and an interpretable representation for social media documents. Marketing can potentially benefit from our method, since it provides an easy-to-use means of addressing specific customer interests from different market regions around the globe. For reproducibility, we provide the code, data, and results of our study.
翻译:以多种语言制作社交媒体的用户生成的内容,使得从技术上比较不同文化和地区一个领域讨论的主题在技术上具有挑战性,对于全球化世界的各个领域,例如市场研究,来自两个国家和市场的人可能对产品有不同的要求。我们提出了一个简单、现代和有效的方法,用以建立一个单一的专题模型,其情绪分析能够同时涵盖多种语言,并能够同时涵盖多种语言,基于预先培训的最先进的了解自然语言的神经网络。为了展示其可行性,我们将该模型应用于报纸文章和用户对特定领域的评论,即有机食品和相关消费行为的评论。主题与各种语言相匹配。此外,我们获得了稳定和与领域相关的专题的很高比例,专题与各自文字内容之间的有意义的关系,以及社会媒体文件的可解释性代表。营销可能受益于我们的方法,因为它提供了一种容易使用的手段,解决全球不同市场区域的特定客户利益。为了重新推广,我们提供了我们研究的代码、数据和结果。