In this paper, we investigate the conversion of a Twitter corpus into geo-referenced raster cells holding the probability of the associated geographical areas of being flooded. We describe a baseline approach that combines a density ratio function, aggregation using a spatio-temporal Gaussian kernel function, and TFIDF textual features. The features are transformed to probabilities using a logistic regression model. The described method is evaluated on a corpus collected after the floods that followed Hurricane Harvey in the Houston urban area in August-September 2017. The baseline reaches a F1 score of 68%. We highlight research directions likely to improve these initial results.
翻译:在本文中,我们调查了将Twitter平台转换为地理参照光栅细胞以保持相关地理区域被淹没的概率。我们描述了一种基线方法,该方法将密度比功能、使用时空空高斯内核功能和TFIDF文字特征的聚合结合起来。这些特征被转换为使用后勤回归模型的概率。所述方法是在2017年8月至9月休斯顿城区飓风哈维飓风发生后的洪水发生后收集的。基线达到68%的F1分。我们强调了有可能改进这些初步结果的研究方向。