Detecting rare events, those defined to give rise to high impact but have a low probability of occurring, is a challenge in a number of domains including meteorological, environmental, financial and economic. The use of machine learning to detect such events is becoming increasingly popular, since they offer an effective and scalable solution when compared to traditional signature-based detection methods. In this work, we begin by undertaking exploratory data analysis, and present techniques that can be used in a framework for employing machine learning methods for rare event detection. Strategies to deal with the imbalance of classes including the selection of performance metrics are also discussed. Despite their popularity, we believe the performance of conventional machine learning classifiers could be further improved, since they are agnostic to the natural order over time in which the events occur. Stochastic processes on the other hand, model sequences of events by exploiting their temporal structure such as clustering and dependence between the different types of events. We develop a model for classification based on Hawkes processes and apply it to a dataset of e-commerce transactions, resulting in not only better predictive performance but also deriving inferences regarding the temporal dynamics of the data.
翻译:与传统的基于签名的探测方法相比,通过机器学习发现这类事件提供了有效和可扩缩的解决办法。在这项工作中,我们首先进行探索性数据分析,并介绍在采用机器学习方法的框架内可以用来探测稀有事件的技术。也讨论了处理各类不平衡的战略,包括选择性能指标。我们认为,传统机器学习分类器尽管受到欢迎,但可以进一步改进,因为它们在事件发生的一段时间内对自然秩序具有不可知性。另一方面,通过利用它们的时间结构,例如不同类型事件之间的集群和依赖性,模拟事件序列。我们根据霍克斯进程开发一个分类模型,并将其应用于电子商务交易的数据系统,不仅可以更好地预测性能,而且还可以推断数据的时间动态。