利用当地新闻数据预测美国的恐怖袭击 (Predicting Terrorist Attacks in the United States using Localized News Data)

Terrorism is a major problem worldwide, causing thousands of fatalities and billions of dollars in damage every year. Toward the end of better understanding and mitigating these attacks, we present a set of machine learning models that learn from localized news data in order to predict whether a terrorist attack will occur on a given calendar date and in a given state. The best model--a Random Forest that learns from a novel variable-length moving average representation of the feature space--achieves area under the receiver operating characteristic scores $> .667$ on four of the five states that were impacted most by terrorism between 2015 and 2018. Our key findings include that modeling terrorism as a set of independent events, rather than as a continuous process, is a fruitful approach--especially when the events are sparse and dissimilar. Additionally, our results highlight the need for localized models that account for differences between locations. From a machine learning perspective, we found that the Random Forest model outperformed several deep models on our multimodal, noisy, and imbalanced data set, thus demonstrating the efficacy of our novel feature representation method in such a context. We also show that its predictions are relatively robust to time gaps between attacks and observed characteristics of the attacks. Finally, we analyze factors that limit model performance, which include a noisy feature space and small amount of available data. These contributions provide an important foundation for the use of machine learning in efforts against terrorism in the United States and beyond.

翻译：在世界范围内,恐怖主义是一个重大问题,每年造成数千人死亡和数十亿美元的损失。为了更好地理解和减轻这些袭击,我们展示了一套机器学习模型,从局部新闻数据中学习,以预测恐怖袭击是否会在特定日历日期和特定状态发生。最佳模型-随机森林,从新颖的可变移动平均比例中学习到地物空间 -- -- 地物区域在接收器下的平均特征表现,在2015年至2018年期间受恐怖主义影响最大的五个国家中的四个国家中,造成数千人死亡和数十亿美元的损失。我们的主要发现包括,将恐怖主义作为一套独立事件而不是一个持续的过程来模拟,这是一种富有成效的方法,特别是在事件稀少和不同的情况下。此外,我们的结果突出表明,需要将不同地点的差异考虑在内的地方性模式。从机器学习的角度来看,我们发现随机森林模型在我们的多式、噪音和不平衡的数据集中超越了几个深层次的模型,从而显示了我们新特征代表方法在这种背景下的功效。我们还表明,其模型对于袭击和观察到的恐怖袭击之间的时间差距是相对稳健的,特别是当事件发生时,这些袭击和机器袭击的特征特征特征特征特征特征特征特征特征特征特征特征特征特征特征特征提供了一种基础。最后,我们展示了这些重要的基础。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

最新《Transformers模型》教程，64页ppt

专知会员服务

324+阅读 · 2020年11月26日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日