社交媒体事件探测一般方法 (A General Method for Event Detection on Social Media)

Event detection on social media has attracted a number of researches, given the recent availability of large volumes of social media discussions. Previous works on social media event detection either assume a specific type of event, or assume certain behavior of observed variables. In this paper, we propose a general method for event detection on social media that makes few assumptions. The main assumption we make is that when an event occurs, affected semantic aspects will behave differently from its usual behavior. We generalize the representation of time units based on word embeddings of social media text, and propose an algorithm to detect events in time series in a general sense. In the experimental evaluation, we use a novel setting to test if our method and baseline methods can exhaustively catch all real-world news in the test period. The evaluation results show that when the event is quite unusual with regard to the base social media discussion, it can be captured more effectively with our method. Our method can be easily implemented and can be treated as a starting point for more specific applications.

翻译：在社交媒体上发现事件吸引了许多研究,因为最近有大量社交媒体讨论。以往的社交媒体事件探测工作要么假设特定事件类型,要么假定某些观察到的变量的行为。在本文中,我们提出了一个在社交媒体上发现事件的一般方法,该方法没有多少假设。我们的主要假设是,当事件发生时,受影响的语义方面将与其通常的行为不同。我们根据社交媒体文字的文字嵌入,将时间单位的表示方式普遍化,并提议一种算法,以一般意义上的时间序列来探测事件。在实验性评估中,我们使用一个新奇的设置来测试我们的方法和基线方法能否在测试期间彻底捕捉到所有真实世界新闻。评估结果显示,当事件在社会媒体基本讨论方面非常不寻常时,可以以我们的方法更有效地捕捉到它。我们的方法可以很容易实施,并且可以被当作更具体应用的起点。

相关内容

词向量表示

关注 37

分散式表示即将语言表示为稠密、低维、连续的向量。研究者最早发现学习得到词嵌入之间存在类比关系。比如apple−apples ≈ car−cars， man−woman ≈ king – queen 等。这些方法都可以直接在大规模无标注语料上进行训练。词嵌入的质量也非常依赖于上下文窗口大小的选择。通常大的上下文窗口学到的词嵌入更反映主题信息，而小的上下文窗口学到的词嵌入更反映词的功能和上下文语义信息。

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

45+阅读 · 2020年12月18日

【KDD2020教程】多模态网络表示学习

专知会员服务

132+阅读 · 2020年8月26日

【斯坦福大学-论文】实体上下文关系路径的知识图谱补全

专知会员服务

105+阅读 · 2020年2月20日

【清华腾讯-AAAI2020】双向图卷积神经网络谣言检测，Rumor Detection on Social Media with Bi-Directional Graph Convolutional Networks

专知会员服务

70+阅读 · 2020年1月20日