High utility sequential pattern mining (HUSPM) aims to mine all patterns that yield a high utility (profit) in a sequence dataset. HUSPM is useful for several applications such as market basket analysis, marketing, and website clickstream analysis. In these applications, users may also consider high utility patterns frequently appearing in the dataset to obtain more fruitful information. However, this task is high computation since algorithms may generate a combinatorial explosive number of candidates that may be redundant or of low importance. To reduce complexity and obtain a compact set of frequent high utility sequential patterns (FHUSPs), this paper proposes an algorithm named CHUSP for mining closed frequent high utility sequential patterns (CHUSPs). Such patterns keep a concise representation while preserving the same expressive power of the complete set of FHUSPs. The proposed algorithm relies on a CHUS data structure to maintain information during mining. It uses three pruning strategies to eliminate early low-utility and non-frequent patterns, thereby reducing the search space. An extensive experimental evaluation was performed on six real-life datasets to evaluate the performance of CHUSP in terms of execution time, memory usage, and the number of generated patterns. Experimental results show that CHUSP can efficiently discover the compact set of CHUSPs under different user-defined thresholds.
翻译:高效用连续型采矿(HUSPM)的目的是在序列数据集中销毁产生高效用(盈利)的所有模式。HUSPM对于市场篮子分析、营销和网站点击流分析等若干应用都有用。在这些应用中,用户还可能考虑数据集中经常出现的高效用模式以获取更丰硕的信息。然而,这是一项很高的计算任务,因为算法可能会产生一个组合式爆炸性数量,这些数量可能是多余的或不太重要的。为了降低复杂性并获得一套经常高效用连续型(FHUSPs)的紧凑组合,本文件建议用一个名为CHUSP的算法,用于开采封闭式频繁的、高效用连续型(CHUSPs),这种算法可以保持一个简明的代表性,同时保留整套FHUSP的同样明确的力量。提议的算法依靠CHUS的数据结构来维护采矿期间的信息。它使用三种支线战略来消除早期低效用和非后果模式,从而减少搜索空间。对六套实际生活数据集进行了广泛的实验性评估,以评价CHUSP在执行时间、记忆存储模式下的不同用户生成的数字。