Language in social media is extremely dynamic: new words emerge, trend and disappear, while the meaning of existing words can fluctuate over time. Such dynamics are especially notable during a period of crisis. This work addresses several important tasks of measuring, visualizing and predicting short term text representation shift, i.e. the change in a word's contextual semantics, and contrasting such shift with surface level word dynamics, or concept drift, observed in social media streams. Unlike previous approaches on learning word representations from text, we study the relationship between short-term concept drift and representation shift on a large social media corpus - VKontakte posts in Russian collected during the Russia-Ukraine crisis in 2014-2015. Our novel contributions include quantitative and qualitative approaches to (1) measure short-term representation shift and contrast it with surface level concept drift; (2) build predictive models to forecast short-term shifts in meaning from previous meaning as well as from concept drift; and (3) visualize short-term representation shift for example keywords to demonstrate the practical use of our approach to discover and track meaning of newly emerging terms in social media. We show that short-term representation shift can be accurately predicted up to several weeks in advance. Our unique approach to modeling and visualizing word representation shifts in social media can be used to explore and characterize specific aspects of the streaming corpus during crisis events and potentially improve other downstream classification tasks including real-time event detection.
翻译:社交媒体的语言极具活力:新词出现、趋势并消失,而现有词的含义可能随时间而波动。这种动态在危机时期特别显著。这项工作涉及计量、可视化和预测短期文本代表制变化等一些重要任务,即一个字的背景语义变化,这种变化与社交媒体流中观察到的表面水平文字动态或概念漂移形成对比。与以往从文字中学习文字表达方式的方法不同,我们研究在2014-2015年俄罗斯-乌克兰危机期间收集的大型社会媒体系统中的短期概念漂移和代表制变化之间的关系。我们的新贡献包括定量和定性办法:(1) 衡量短期代表制变化并与表面概念漂移形成对比;(2) 建立预测模型,预测短期含义变化与在社会媒体流中观察到的表面文字动态动态动态或概念漂移形成对比;(3) 将短期代表制变化(例如)关键词进行视觉化,以显示我们发现和跟踪社会媒体新术语含义的实际应用情况。我们显示,短期代表制变化可以准确预测到视觉事件前数周的动态,并用我们独特的模式探索其他模式来改进社交事件。