In this paper, we introduce the new problem of extracting fine-grained traffic information from Twitter streams by also making publicly available the two (constructed) traffic-related datasets from Belgium and the Brussels capital region. In particular, we experiment with several models to identify (i) whether a tweet is traffic-related or not, and (ii) in the case that the tweet is traffic-related to identify more fine-grained information regarding the event (e.g., the type of the event, where the event happened). To do so, we frame (i) the problem of identifying whether a tweet is a traffic-related event or not as a text classification subtask, and (ii) the problem of identifying more fine-grained traffic-related information as a slot filling subtask, where fine-grained information (e.g., where an event has happened) is represented as a slot/entity of a particular type. We propose the use of several methods that process the two subtasks either separately or in a joint setting, and we evaluate the effectiveness of the proposed methods for solving the traffic event detection problem. Experimental results indicate that the proposed architectures achieve high performance scores (i.e., more than 95% in terms of F$_{1}$ score) on the constructed datasets for both of the subtasks (i.e., text classification and slot filling) even in a transfer learning scenario. In addition, by incorporating tweet-level information in each of the tokens comprising the tweet (for the BERT-based model) can lead to a performance improvement for the joint setting.
翻译:在本文中,我们提出从Twitter流中提取微小流量信息的新问题,方法是公布比利时和布鲁塞尔首都地区的两个(构筑的)与交通有关的数据集。特别是,我们实验了几个模型,以确定:(一) 微博是否与交通相关;(二) 微博是否与交通相关;(二) 微博是否与交通相关,以确定与交通相关的更多信息(例如,事件类型,事件发生地点)。为此,我们设定了(一) 微博是否与交通相关事件相关,或是否作为文本标识类子任务分类;(二) 确定与交通有关的信息是否更为精细化;(一) 微博是否与交通相关;(二) 如何将更多微小信息确定为填补子任务中的空档(例如,发生事件) 微小信息是特定类型的一个时段/时间段/时间段/时间。我们提议采用几种方法,分别处理两个子任务,或在一个联合设置中进行分类,我们评估拟议用于解决价格相关事件平整数(ax) 水平的进度结构中,即实验结果显示每平分数的进度的进度。