Event detection on social media has attracted a number of researches, given the recent availability of large volumes of social media discussions. Previous works on social media event detection either assume a specific type of event, or assume certain behavior of observed variables. In this paper, we propose a general method for event detection on social media that makes few assumptions. The main assumption we make is that when an event occurs, affected semantic aspects will behave differently from its usual behavior. We generalize the representation of time units based on word embeddings of social media text, and propose an algorithm to detect events in time series in a general sense. In the experimental evaluation, we use a novel setting to test if our method and baseline methods can exhaustively catch all real-world news in the test period. The evaluation results show that when the event is quite unusual with regard to the base social media discussion, it can be captured more effectively with our method. Our method can be easily implemented and can be treated as a starting point for more specific applications.
翻译:在社交媒体上发现事件吸引了许多研究,因为最近有大量社交媒体讨论。 以往的社交媒体事件探测工作要么假设特定事件类型,要么假定某些观察到的变量的行为。 在本文中,我们提出了一个在社交媒体上发现事件的一般方法,该方法没有多少假设。 我们的主要假设是,当事件发生时,受影响的语义方面将与其通常的行为不同。 我们根据社交媒体文字的文字嵌入,将时间单位的表示方式普遍化,并提议一种算法,以一般意义上的时间序列来探测事件。 在实验性评估中,我们使用一个新奇的设置来测试我们的方法和基线方法能否在测试期间彻底捕捉到所有真实世界新闻。 评估结果显示,当事件在社会媒体基本讨论方面非常不寻常时,可以以我们的方法更有效地捕捉到它。 我们的方法可以很容易实施,并且可以被当作更具体应用的起点。