With the rapid development of online payment platforms, it is now possible to record massive transaction data. Clustering on transaction data significantly contributes to analyzing merchants' behavior patterns. This enables payment platforms to provide differentiated services or implement risk management strategies. However, traditional methods exploit transactions by generating low-dimensional features, leading to inevitable information loss. In this study, we use the empirical cumulative distribution of transactions to characterize merchants. We adopt Wasserstein distance to measure the dissimilarity between any two merchants and propose the Wasserstein-distance-based spectral clustering (WSC) approach. Based on the similarities between merchants' transaction distributions, a graph of merchants is generated. Thus, we treat the clustering of merchants as a graph-cut problem and solve it under the framework of spectral clustering. To ensure feasibility of the proposed method on large-scale datasets with limited computational resources, we propose a subsampling method for WSC (SubWSC). The associated theoretical properties are investigated to verify the efficiency of the proposed approach. The simulations and empirical study demonstrate that the proposed method outperforms feature-based methods in finding behavior patterns of merchants.
翻译:随着在线支付平台的迅速发展,现在有可能记录大量交易数据。交易数据集中化极大地有助于分析商人的行为模式。这使得支付平台能够提供有区别的服务或实施风险管理战略。然而,传统方法利用交易,产生低维特征,导致不可避免的信息损失。在这项研究中,我们利用交易的累积累积经验分布来给商人定性。我们采用瓦塞斯坦距离来测量任何两个商人之间的差异,并提出瓦塞斯坦光谱聚集法。根据商家交易分布的相似之处,制作了一个商人图表。因此,我们把商人聚在一起当作一个图表问题,在光谱集的框架内加以解决。为了确保利用有限的计算资源建立大型数据集的拟议方法的可行性,我们建议了WSC(SubWSC)的次级抽样方法。对相关的理论特性进行了调查,以核实拟议方法的效率。模拟和经验研究表明,拟议的方法在寻找商业行为模式时,其特征方法优于功能。