Public transport agencies use social media as an essential tool for communicating mobility incidents to passengers. However, while the short term, day-to-day information about transport phenomena is usually posted in social media with low latency, its availability is short term as the content is rarely made an aggregated form. Social media communication of transport phenomena usually lacks GIS annotations as most social media platforms do not allow attaching non-POI GPS coordinates to posts. As a result, the analysis of transport phenomena information is minimal. We collected three years of social media posts of a polish public transport company with user comments. Through exploration, we infer a six-class transport information typology. We successfully build an information type classifier for social media posts, detect stop names in posts, and relate them to GPS coordinates, obtaining a spatial understanding of long-term aggregated phenomena. We show that our approach enables citizen science and use it to analyze the impact of three years of infrastructure incidents on passenger mobility, and the sentiment and reaction scale towards each of the events. All these results are achieved for Polish, an under-resourced language when it comes to spatial language understanding, especially in social media contexts. To improve the situation, we released two of our annotated data sets: social media posts with incident type labels and matched stop names and social media comments with the annotated sentiment. We also opensource the experimental codebase.
翻译:公共交通机构使用社交媒体作为向乘客通报流动事件的基本工具,然而,虽然短期内有关交通现象的日常信息通常在社会媒体中公布,但这种信息是短期的,因为其内容很少以综合形式出现; 社交媒体对交通现象的通信通常缺乏地理信息系统说明,因为大多数社交媒体平台不允许将非POI全球定位系统的坐标附加到各站,因此,对交通现象信息的分析是极少的; 我们收集了一个有用户评论的光滑公共交通公司的三年社交媒体职位。 我们通过探索,推断出六级交通信息类型。 我们成功地为社交媒体职位建立一个信息分类器,在岗位上发现停用的名字,并将其与全球定位系统坐标联系起来,获得对长期综合现象的空间理解。 我们表明,我们的方法使公民科学能够利用它分析三年基础设施事件对乘客流动的影响,以及每次事件的态度和反应规模。 所有这些结果都是波兰语,在空间语言理解方面,特别是在社会媒体背景下,都是资源不足的语言。我们成功地建立了一个信息分类,我们成功地为社交媒体职位设置了一个信息分类,并把它们与GPS坐标连接起来,我们发布了两套带有附加说明的社会媒体标记的数据标签。