ATM 利用流数据分析分析法探测自动取款机欺诈</s> (ATM Fraud Detection using Streaming Data Analytics)

Gaining the trust and confidence of customers is the essence of the growth and success of financial institutions and organizations. Of late, the financial industry is significantly impacted by numerous instances of fraudulent activities. Further, owing to the generation of large voluminous datasets, it is highly essential that underlying framework is scalable and meet real time needs. To address this issue, in the study, we proposed ATM fraud detection in static and streaming contexts respectively. In the static context, we investigated a parallel and scalable machine learning algorithms for ATM fraud detection that is built on Spark and trained with a variety of machine learning (ML) models including Naive Bayes (NB), Logistic Regression (LR), Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), Gradient Boosting Tree (GBT), and Multi-layer perceptron (MLP). We also employed several balancing techniques like Synthetic Minority Oversampling Technique (SMOTE) and its variants, Generative Adversarial Networks (GAN), to address the rarity in the dataset. In addition, we proposed a streaming based ATM fraud detection in the streaming context. Our sliding window based method collects ATM transactions that are performed within a specified time interval and then utilizes to train several ML models, including NB, RF, DT, and K-Nearest Neighbour (KNN). We selected these models based on their less model complexity and quicker response time. In both contexts, RF turned out to be the best model. RF obtained the best mean AUC of 0.975 in the static context and mean AUC of 0.910 in the streaming context. RF is also empirically proven to be statistically significant than the next-best performing models.

翻译：获得客户的信任和信心是金融机构和组织增长和成功的本质。最近,金融业受到许多欺诈活动事例的严重影响。此外,由于生成了大量的数据集,极有必要使基本框架能够伸缩并满足实时需求。为了解决这个问题,我们在研究中提议在静态和流态背景下分别检测自动取款机欺诈。在静态背景下,我们调查了一种平行和可扩缩的自动取款机学习算法,用于检测自动取款机欺诈10,该算法建在Spark上,并经过各种机器学习(ML)模型的培训,包括Naive Bayes(Nive Bayes),物流回归(LL)、支持矢量机(SVMM)、决定树(DT)、随机森林(Randreform Forest)、引水树(GBT)和多层感官感知器(MLP)。我们还采用了一些平衡技术,如Synthet PortyMT(SMOTE)及其变体、General A-RF AS(G-ATM(GAN), 和RF Relental IMF IM IM IM IM IM IM IM) 系统内部的模拟中,我们基于的流流流中测算,这是一种最低流流路路路路路模式。此外的模型。我们用一种基于的测算方法,然后的测算方法,以若干的测算方法,这是一种最低路路路路路路路路路路路路路路路路路路路路路路。</s>