使用本地新闻数据预测美国恐怖袭击

论文标题

使用本地新闻数据预测美国恐怖袭击

Predicting Terrorist Attacks in the United States using Localized News Data

论文作者

Krieg, Steven J., Smith, Christian W., Chatterjee, Rusha, Chawla, Nitesh V.

论文摘要

恐怖主义是全世界的一个主要问题，每年造成数千起死亡和数十亿美元的损失。在更好地理解和缓解这些攻击的结束时，我们提出了一组机器学习模型，这些模型从本地新闻数据中学习，以预测在给定日历日期和给定状态下是否会发生恐怖袭击。最佳模型 - 从新颖的变量长度移动平均值中学习的特征空间的森林 - 在接收器操作特征下的区域在2015年至2018年间受到恐怖主义影响最大的五个州中的四个州的$> 0.667 $。我们的主要发现是将恐怖主义作为一组独立事件，而不是持续的事件，尤其是在统一的过程中，尤其是在水平的过程中，尤其是依据。此外，我们的结果强调了对局部模型的需求，以解释位置之间的差异。从机器学习的角度来看，我们发现随机森林模型在我们的多模式，嘈杂和不平衡的数据集上优于几个深层模型，因此在这种情况下证明了我们新颖的特征表示方法的功效。我们还表明，它的预测对于攻击和观察到的攻击特征之间的时间差距相对牢固。最后，我们分析了限制模型性能的因素，其中包括嘈杂的特征空间和少量可用数据。这些贡献为在美国及其他地区的恐怖主义努力中使用机器学习提供了重要的基础。

Terrorism is a major problem worldwide, causing thousands of fatalities and billions of dollars in damage every year. Toward the end of better understanding and mitigating these attacks, we present a set of machine learning models that learn from localized news data in order to predict whether a terrorist attack will occur on a given calendar date and in a given state. The best model--a Random Forest that learns from a novel variable-length moving average representation of the feature space--achieves area under the receiver operating characteristic scores $> .667$ on four of the five states that were impacted most by terrorism between 2015 and 2018. Our key findings include that modeling terrorism as a set of independent events, rather than as a continuous process, is a fruitful approach--especially when the events are sparse and dissimilar. Additionally, our results highlight the need for localized models that account for differences between locations. From a machine learning perspective, we found that the Random Forest model outperformed several deep models on our multimodal, noisy, and imbalanced data set, thus demonstrating the efficacy of our novel feature representation method in such a context. We also show that its predictions are relatively robust to time gaps between attacks and observed characteristics of the attacks. Finally, we analyze factors that limit model performance, which include a noisy feature space and small amount of available data. These contributions provide an important foundation for the use of machine learning in efforts against terrorism in the United States and beyond.

下载PDF全文

下载文献需遵守相关版权规定

论文标题