Dozens of terrorist attacks are perpetrated in the United States every year, often causing fatalities and other significant damage. Toward the end of better understanding and mitigating these attacks, we present a set of machine learning models that learn from localized news data in order to predict whether a terrorist attack will occur on a given calendar date and in a given state. The best model--a Random Forest that learns from a novel variable-length moving average representation of the feature space--achieves area under the receiver operating characteristic scores $> .667$ on four of the five states that were impacted most by terrorism between 2015 and 2018. Our key findings include that modeling terrorism as a set of independent events, rather than as a continuous process, is a fruitful approach--especially when the events are sparse and dissimilar. Additionally, our results highlight the need for localized models that account for differences between locations. From a machine learning perspective, we found that the Random Forest model outperformed several deep models on our multimodal, noisy, and imbalanced data set, thus demonstrating the efficacy of our novel feature representation method in such a context. We also show that its predictions are relatively robust to time gaps between attacks and observed characteristics of the attacks. Finally, we analyze factors that limit model performance, which include a noisy feature space and small amount of available data. These contributions provide an important foundation for the use of machine learning in efforts against terrorism in the United States and beyond.
翻译:美国每年发生数十起恐怖袭击,往往造成死亡和其他重大损害。为了结束更好地理解和减轻这些袭击,我们提出了一套机器学习模型,从局部新闻数据中学习,以预测恐怖袭击是否会在特定日历日期和特定状态发生。最佳模型-随机森林,从新颖的可变长移动平均代表空间 -- -- 成就区域特征中学习,接收器操作特征下的空间 -- -- 成就区域平均代表数在2015年至2018年期间受恐怖主义影响最大的五个州中,有四个州超过667美元。我们的主要结论包括,将恐怖主义作为一套独立事件的模式,而不是作为一个持续的过程,是一种富有成效的方法,特别是在事件稀少和不相干的情况下。此外,我们的结果突出表明,需要将不同地点的局部模式纳入不同之处。从机器学习角度,我们发现随机森林模型在我们的多式联运、噪音和不平衡的数据集中超越了几个深层次模型,从而显示了我们新特征代表方法在2015年至2018年之间的效力。我们还表明,各国对恐怖主义的模型是一套独立事件,而不是一个持续的过程,是一种富有成果的方法,特别是当事件分散和不同时,当事件发生时,我们观察的结果突出表明,我们需要用这些模型来分析攻击和观察各种特征特征特征特征特征的特征的特征的特征的特征的特征特征特征。