反本福德子集:金融网络中不受监督的异常探测 (AntiBenford Subgraphs: Unsupervised Anomaly Detection in Financial Networks)

Benford's law describes the distribution of the first digit of numbers appearing in a wide variety of numerical data, including tax records, and election outcomes, and has been used to raise "red flags" about potential anomalies in the data such as tax evasion. In this work, we ask the following novel question: given a large transaction or financial graph, how do we find a set of nodes that perform many transactions among each other that also deviate significantly from Benford's law? We propose the AntiBenford subgraph framework that is founded on well-established statistical principles. Furthermore, we design an efficient algorithm that finds AntiBenford subgraphs in near-linear time on real data. We evaluate our framework on both real and synthetic data against a variety of competitors. We show empirically that our proposed framework enables the detection of anomalous subgraphs in cryptocurrency transaction networks that go undetected by state-of-the-art graph-based anomaly detection methods. Our empirical findings show that our \ab framework is able to mine anomalous subgraphs, and provide novel insights into financial transaction data. The code and the datasets are available at \url{https://github.com/tsourakakis-lab/antibenford-subgraphs}.

翻译：Benford 的法律描述了大量数字数据(包括税收记录)和选举结果中数字头数的分布情况,并被用来提高“红旗”“红旗”关于逃税等数据中潜在异常现象的“红旗”。在这项工作中,我们提出以下新颖的问题:考虑到一个巨大的交易或金融图表,我们如何找到一组节点,在彼此之间进行许多交易,而这些交易也与Benford法律大相径庭?我们建议了建立在既定统计原则基础上的反本福分图框架。此外,我们设计了一个高效的算法,在近线时间发现AntiBenford子图,并针对各种竞争者评估我们关于真实和合成数据的框架。我们从经验上表明,我们提议的框架有助于在隐性货币交易网络中检测出无法被基于州艺术的图表的异常检测方法发现的异常现象。我们的实证结果显示,我们的框架能够对异常现象子图进行勘测,并对金融交易数据提供新的洞察。我们的代码和数据可在http://babursqours。

相关内容

异常检测

关注 102

在数据挖掘中，异常检测（英语：anomaly detection）对不符合预期模式或数据集中其他项目的项目、事件或观测值的识别。通常异常项目会转变成银行欺诈、结构缺陷、医疗问题、文本错误等类型的问题。异常也被称为离群值、新奇、噪声、偏差和例外。特别是在检测滥用与网络入侵时，有趣性对象往往不是罕见对象，但却是超出预料的突发活动。这种模式不遵循通常统计定义中把异常点看作是罕见对象，于是许多异常检测方法（特别是无监督的方法）将对此类数据失效，除非进行了合适的聚集。相反，聚类分析算法可能可以检测出这些模式形成的微聚类。有三大类异常检测方法。[1] 在假设数据集中大多数实例都是正常的前提下，无监督异常检测方法能通过寻找与其他数据最不匹配的实例来检测出未标记测试数据的异常。监督式异常检测方法需要一个已经被标记“正常”与“异常”的数据集，并涉及到训练分类器（与许多其他的统计分类问题的关键区别是异常检测的内在不均衡性）。半监督式异常检测方法根据一个给定的正常训练数据集创建一个表示正常行为的模型，然后检测由学习模型生成的测试实例的可能性。

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日