There is increasing interest to work with user generated content in social media, especially textual posts over time. Currently there is no consistent way of segmenting user posts into timelines in a meaningful way that improves the quality and cost of manual annotation. Here we propose a set of methods for segmenting longitudinal user posts into timelines likely to contain interesting moments of change in a user's behaviour, based on their online posting activity. We also propose a novel framework for evaluating timelines and show its applicability in the context of two different social media datasets. Finally, we present a discussion of the linguistic content of highly ranked timelines.
翻译:人们越来越有兴趣与社交媒体中由用户生成的内容,特别是文字文章的内容合作,目前没有一致的方式,将用户职位分成一个有意义的时间段,以提高人工说明的质量和成本。在这里,我们提出一套方法,将纵向用户职位分为一个可能包含用户行为变化的有趣时刻的时段,以在线张贴活动为基础。我们还提议了一个新的框架,用于评价时间段,并显示其在两个不同的社交媒体数据集中的适用性。最后,我们提出一个高档时间段语言内容的讨论。</s>