Graph-based Neural Networks (GNNs) are recent models created for learning representations of nodes (and graphs), which have achieved promising results when detecting patterns that occur in large-scale data relating different entities. Among these patterns, financial fraud stands out for its socioeconomic relevance and for presenting particular challenges, such as the extreme imbalance between the positive (fraud) and negative (legitimate transactions) classes, and the concept drift (i.e., statistical properties of the data change over time). Since GNNs are based on message propagation, the representation of a node is strongly impacted by its neighbors and by the network's hubs, amplifying the imbalance effects. Recent works attempt to adapt undersampling and oversampling strategies for GNNs in order to mitigate this effect without, however, accounting for concept drift. In this work, we conduct experiments to evaluate existing techniques for detecting network fraud, considering the two previous challenges. For this, we use real data sets, complemented by synthetic data created from a new methodology introduced here. Based on this analysis, we propose a series of improvement points that should be investigated in future research.
翻译:以图表为基础的神经网络(GNNs)是最近为学习节点(和图)的表示而创建的模型,这些模型在发现与不同实体有关的大规模数据中出现的模式时取得了可喜的成果,其中金融欺诈突出表现为其社会经济相关性和提出特殊挑战,如正(欺诈)和负(合法交易)类别之间的极端不平衡,以及概念漂移(即数据随时间变化的统计属性)等。由于全球NNes基于信息传播,节点的表示受到其邻居和网络枢纽的强烈影响,从而扩大了不平衡效应。最近,我们试图调整全球网点的抽样和过度抽样战略,以便减轻这一影响,但不考虑概念的漂移。在这项工作中,我们进行实验,以评价现有侦查网络欺诈的技术,同时考虑到前两个挑战。关于这一点,我们使用真实的数据集,并辅之以从这里采用的新方法产生的合成数据。根据这一分析,我们提出了一系列改进点,供今后研究调查。