Since the label collecting is prohibitive and time-consuming, unsupervised methods are preferred in applications such as fraud detection. Meanwhile, such applications usually require modeling the intrinsic clusters in high-dimensional data, which usually displays heterogeneous statistical patterns as the patterns of different clusters may appear in different dimensions. Existing methods propose to model the data clusters on selected dimensions, yet globally omitting any dimension may damage the pattern of certain clusters. To address the above issues, we propose a novel unsupervised generative framework called FIRD, which utilizes adversarial distributions to fit and disentangle the heterogeneous statistical patterns. When applying to discrete spaces, FIRD effectively distinguishes the synchronized fraudsters from normal users. Besides, FIRD also provides superior performance on anomaly detection datasets compared with SOTA anomaly detection methods (over 5% average AUC improvement). The significant experiment results on various datasets verify that the proposed method can better model the heterogeneous statistical patterns in high-dimensional data and benefit downstream applications.
翻译:由于标签收集是令人望而却步和费时的,因此在欺诈检测等应用中倾向于采用不受监督的方法。与此同时,这类应用通常需要模拟高维数据的内在组群,这些数据通常显示不同组群模式的不同统计模式,因为不同组群的模式可能在不同层面出现。现有的方法提议在选定层面建立数据组群模式,但全球略去任何层面都可能损害某些组群的模式。为了解决上述问题,我们提议了一个新的、不受监督的基因化框架,称为FIRID,它利用对称分布来适应和分解多元统计模式。在应用离散空间时,FIRD有效地区分了同步的欺诈者与正常用户。此外,FIRID还提供异常检测数据集比STO异常检测方法(平均5%以上AUC改进)的优异性表现。关于各种数据集的重大实验结果证实,拟议的方法可以更好地模拟高维数据和下游应用中的多维统计模式。