The advancement of social media contributes to the growing amount of content they share frequently. This framework provides a sophisticated place for people to report various real-life events. Detecting these events with the help of natural language processing has received researchers' attention, and various algorithms have been developed for this goal. In this paper, we propose a Semantic Modular Model (SMM) consisting of 5 different modules, namely Distributional Denoising Autoencoder, Incremental Clustering, Semantic Denoising, Defragmentation, and Ranking and Processing. The proposed model aims to (1) cluster various documents and ignore the documents that might not contribute to the identification of events, (2) identify more important and descriptive keywords. Compared to the state-of-the-art methods, the results show that the proposed model has a higher performance in identifying events with lower ranks and extracting keywords for more important events in three English Twitter datasets: FACup, SuperTuesday, and USElection. The proposed method outperformed the best reported results in the mean keyword-precision metric by 7.9\%.
翻译:社交媒体的进步促进了他们经常分享的内容数量的增长。 这个框架为人们提供了一个复杂的场所,可以报告各种真实生活事件。在自然语言处理的帮助下检测这些事件,引起了研究人员的注意,并且为此目标制定了各种算法。在本文中,我们提议了一个语义模块(SMM),由5个不同的模块组成,即分布式Denoising Autoencoder、递增集群、语义拒绝、分流、排名和处理。提议的模型旨在(1) 将各种文件分组,忽略可能无助于确定事件的文件,(2) 确定更重要和描述性的关键词。与最新方法相比,结果显示,拟议的模型在确定级别较低的事件和为三个英文推特数据集(CACup、SuperTuesday和USElection)中更重要事件提取关键词方面表现得更高。提议的方法超过了7.9“中关键词精确度指标”中报告的最佳结果。