Terrorism is a major problem worldwide, causing thousands of fatalities and billions of dollars in damage every year. Toward the end of better understanding and mitigating these attacks, we present a set of machine learning models that learn from localized news data in order to predict whether a terrorist attack will occur on a given calendar date and in a given state. The best model--a Random Forest that learns from a novel variable-length moving average representation of the feature space--achieves area under the receiver operating characteristic scores $> .667$ on four of the five states that were impacted most by terrorism between 2015 and 2018. Our key findings include that modeling terrorism as a set of independent events, rather than as a continuous process, is a fruitful approach--especially when the events are sparse and dissimilar. Additionally, our results highlight the need for localized models that account for differences between locations. From a machine learning perspective, we found that the Random Forest model outperformed several deep models on our multimodal, noisy, and imbalanced data set, thus demonstrating the efficacy of our novel feature representation method in such a context. We also show that its predictions are relatively robust to time gaps between attacks and observed characteristics of the attacks. Finally, we analyze factors that limit model performance, which include a noisy feature space and small amount of available data. These contributions provide an important foundation for the use of machine learning in efforts against terrorism in the United States and beyond.
翻译:在世界范围内,恐怖主义是一个重大问题,每年造成数千人死亡和数十亿美元的损失。为了更好地理解和减轻这些袭击,我们展示了一套机器学习模型,从局部新闻数据中学习,以预测恐怖袭击是否会在特定日历日期和特定状态发生。最佳模型-随机森林,从新颖的可变移动平均比例中学习到地物空间 -- -- 地物区域在接收器下的平均特征表现,在2015年至2018年期间受恐怖主义影响最大的五个国家中的四个国家中,造成数千人死亡和数十亿美元的损失。我们的主要发现包括,将恐怖主义作为一套独立事件而不是一个持续的过程来模拟,这是一种富有成效的方法,特别是在事件稀少和不同的情况下。此外,我们的结果突出表明,需要将不同地点的差异考虑在内的地方性模式。从机器学习的角度来看,我们发现随机森林模型在我们的多式、噪音和不平衡的数据集中超越了几个深层次的模型,从而显示了我们新特征代表方法在这种背景下的功效。我们还表明,其模型对于袭击和观察到的恐怖袭击之间的时间差距是相对稳健的,特别是当事件发生时,这些袭击和机器袭击的特征特征特征特征特征特征特征特征特征特征特征特征特征特征特征特征提供了一种基础。最后,我们展示了这些重要的基础。