The present study proposes a novel method of trend detection and visualization - more specifically, modeling the change in a topic over time. Where current models used for the identification and visualization of trends only convey the popularity of a singular word based on stochastic counting of usage, the approach in the present study illustrates the popularity and direction that a topic is moving in. The direction in this case is a distinct subtopic within the selected corpus. Such trends are generated by modeling the movement of a topic by using k-means clustering and cosine similarity to group the distances between clusters over time. In a convergent scenario, it can be inferred that the topics as a whole are meshing (tokens between topics, becoming interchangeable). On the contrary, a divergent scenario would imply that each topics' respective tokens would not be found in the same context (the words are increasingly different to each other). The methodology was tested on a group of articles from various media houses present in the 20 Newsgroups dataset.
翻译:本研究报告提出了一种新颖的趋势探测和可视化方法,更具体地说,对一个专题的改变进行模拟。在目前用于确定和可视化趋势的模型仅反映基于对使用情况进行随机统计的单一单词的受欢迎程度的情况下,本研究报告中的方法说明了一个专题正在走向的受欢迎程度和方向。本研究报告中的方向是选定内容中的一个不同的分专题。这些趋势是通过使用 k- means 群集来模拟一个专题的动向,并使用与时间组群之间的距离相近的组合来生成的。在一个趋同的假设中,可以推断整个专题是混在一起的(在主题之间取,可以互换 ) 。相反,一个不同的假设将意味着每个专题的代号不会在同一背景下找到(这些词彼此越来越不同 ) 。该方法是在20 Newsgroups数据集中来自各媒体的一组文章上测试的。