High-utility sequential pattern mining (HUSPM) has emerged as an important topic due to its wide application and considerable popularity. However, due to the combinatorial explosion of the search space when the HUSPM problem encounters a low utility threshold or large-scale data, it may be time-consuming and memory-costly to address the HUSPM problem. Several algorithms have been proposed for addressing this problem, but they still cost a lot in terms of running time and memory usage. In this paper, to further solve this problem efficiently, we design a compact structure called sequence projection (seqPro) and propose an efficient algorithm, namely discovering high-utility sequential patterns with the seqPro structure (HUSP-SP). HUSP-SP utilizes the compact seq-array to store the necessary information in a sequence database. The seqPro structure is designed to efficiently calculate candidate patterns' utilities and upper bound values. Furthermore, a new upper bound on utility, namely tighter reduced sequence utility (TRSU) and two pruning strategies in search space, are utilized to improve the mining performance of HUSP-SP. Experimental results on both synthetic and real-life datasets show that HUSP-SP can significantly outperform the state-of-the-art algorithms in terms of running time, memory usage, search space pruning efficiency, and scalability.
翻译:高功率序列式采矿(HUSPM)因其广泛应用和相当受欢迎而成为一个重要的专题。然而,由于HUSPM问题遇到低功用阈值或大尺度数据时搜索空间的组合爆炸,解决HUSPM问题可能需要花费时间和记忆成本。为解决这一问题提出了几种算法,但在运行时间和记忆使用方面,它们的成本仍然很大。在本文件中,为了进一步有效解决这一问题,我们设计了一个称为序列投影(SeqPro)的紧凑结构,并提出一种有效的算法,即发现后继结构(HUSP-SP)的高功用顺序模式。HUSP-SP利用紧凑的后继系统将必要信息储存在序列数据库中。后继结构旨在有效地计算候选模式的功用和上限值。此外,为了进一步有效解决这一问题,我们设计了一个称为更紧凑的序列功用(TRSUPU)和两个搜索空间的调整战略,用来改进HUSP-SP-后继结构结构(HUSP-SP-SLO-LO-LO-LA-RO-LAD-LA-S-SLAD-SLAD-S-S-S-S-SLAD-S-S-S-SLOL-S-S-S-SL-S-SL-SL-SL-SL-S-S-S-S-S-S-S-SMAD-S-S-S-S-S-S-S-S-S-S-SL-SL-SL-S-S-S-S-S-S-SD-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S