利用当地新闻数据预测美国的恐怖袭击 (Predicting Terrorist Attacks in the United States using Localized News Data)

Dozens of terrorist attacks are perpetrated in the United States every year, often causing fatalities and other significant damage. Toward the end of better understanding and mitigating these attacks, we present a set of machine learning models that learn from localized news data in order to predict whether a terrorist attack will occur on a given calendar date and in a given state. The best model--a Random Forest that learns from a novel variable-length moving average representation of the feature space--achieves area under the receiver operating characteristic scores $> .667$ on four of the five states that were impacted most by terrorism between 2015 and 2018. Our key findings include that modeling terrorism as a set of independent events, rather than as a continuous process, is a fruitful approach--especially when the events are sparse and dissimilar. Additionally, our results highlight the need for localized models that account for differences between locations. From a machine learning perspective, we found that the Random Forest model outperformed several deep models on our multimodal, noisy, and imbalanced data set, thus demonstrating the efficacy of our novel feature representation method in such a context. We also show that its predictions are relatively robust to time gaps between attacks and observed characteristics of the attacks. Finally, we analyze factors that limit model performance, which include a noisy feature space and small amount of available data. These contributions provide an important foundation for the use of machine learning in efforts against terrorism in the United States and beyond.

翻译：美国每年发生数十起恐怖袭击,往往造成死亡和其他重大损害。为了结束更好地理解和减轻这些袭击,我们提出了一套机器学习模型,从局部新闻数据中学习,以预测恐怖袭击是否会在特定日历日期和特定状态发生。最佳模型-随机森林,从新颖的可变长移动平均代表空间 -- -- 成就区域特征中学习,接收器操作特征下的空间 -- -- 成就区域平均代表数在2015年至2018年期间受恐怖主义影响最大的五个州中,有四个州超过667美元。我们的主要结论包括,将恐怖主义作为一套独立事件的模式,而不是作为一个持续的过程,是一种富有成效的方法,特别是在事件稀少和不相干的情况下。此外,我们的结果突出表明,需要将不同地点的局部模式纳入不同之处。从机器学习角度,我们发现随机森林模型在我们的多式联运、噪音和不平衡的数据集中超越了几个深层次模型,从而显示了我们新特征代表方法在2015年至2018年之间的效力。我们还表明,各国对恐怖主义的模型是一套独立事件,而不是一个持续的过程,是一种富有成果的方法,特别是当事件分散和不同时,当事件发生时,我们观察的结果突出表明,我们需要用这些模型来分析攻击和观察各种特征特征特征特征特征的特征的特征的特征的特征的特征特征特征。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

最新《Transformers模型》教程，64页ppt

专知会员服务

324+阅读 · 2020年11月26日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日