In this paper, we propose a framework to detect topics in social media based on Human Word Association. Identifying topics discussed in these media has become a critical and significant challenge. Most of the work done in this area is in English, but much has been done in the Persian language, especially microblogs written in Persian. Also, the existing works focused more on exploring frequent patterns or semantic relationships and ignored the structural methods of language. In this paper, a topic detection framework using HWA, a method for Human Word Association, is proposed. This method uses the concept of imitation of mental ability for word association. This method also calculates the Associative Gravity Force that shows how words are related. Using this parameter, a graph can be generated. The topics can be extracted by embedding this graph and using clustering methods. This approach has been applied to a Persian language dataset collected from Telegram. Several experimental studies have been performed to evaluate the proposed framework's performance. Experimental results show that this approach works better than other topic detection methods.
翻译:在本文中,我们提出一个在人类文字协会的基础上检测社交媒体专题的框架。 确定这些媒体讨论的专题已成为一项关键和重大的挑战。 在该领域完成的大部分工作都是英语, 但已经做了许多波斯语的工作, 特别是波斯语撰写的微博客。 此外, 现有的工作更侧重于探索频繁的模式或语义关系, 忽视语言的结构方法。 在本文中, 提出了一个使用人类文字协会方法HWA(人类文字协会的一种方法) 的专题探测框架。 这个方法使用了对单词联系进行智力能力模仿的概念。 这个方法还计算了显示单词关联的复合重力。 使用这个参数, 可以生成一个图形。 可以通过嵌入这个图形和使用集方法来提取这些主题。 这个方法已被应用于从Telegram收集的波斯语数据集。 已经进行了一些实验性研究来评估拟议框架的性能。 实验结果显示, 这个方法比其他专题探测方法效果更好。