For applied intelligence, utility-driven pattern discovery algorithms can identify insightful and useful patterns in databases. However, in these techniques for pattern discovery, the number of patterns can be huge, and the user is often only interested in a few of those patterns. Hence, targeted high-utility itemset mining has emerged as a key research topic, where the aim is to find a subset of patterns that meet a targeted pattern constraint instead of all patterns. This is a challenging task because efficiently finding tailored patterns in a very large search space requires a targeted mining algorithm. A first algorithm called TargetUM has been proposed, which adopts an approach similar to post-processing using a tree structure, but the running time and memory consumption are unsatisfactory in many situations. In this paper, we address this issue by proposing a novel list-based algorithm with pattern matching mechanism, named THUIM (Targeted High-Utility Itemset Mining), which can quickly match high-utility itemsets during the mining process to select the targeted patterns. Extensive experiments were conducted on different datasets to compare the performance of the proposed algorithm with state-of-the-art algorithms. Results show that THUIM performs very well in terms of runtime and memory consumption, and has good scalability compared to TargetUM.
翻译:对应用情报而言,由用户驱动的模式发现算法可以确定数据库中具有洞察力和有用模式。然而,在这些模式发现技术中,模式的数量可能非常巨大,用户往往只对其中的几种模式感兴趣。因此,定向高功用物品集采采矿已经成为一个关键的研究课题,目的是找到符合特定模式制约而不是所有模式的一组模式。这是一项具有挑战性的任务,因为在一个非常大的搜索空间中高效地找到量身定制的模式需要有目标的采矿算法。已经提出了第一个称为目标Umm的算法,它采用了类似于使用树结构进行后处理的方法,但运行时间和记忆消耗在许多情况下并不令人满意。在本文件中,我们通过提出一种基于新式清单的算法来解决这一问题,该算法与模式匹配机制称为THUIM(高功用物品集采掘),在采矿过程中可以迅速匹配高功用物品项选择目标模式。在不同的数据集上进行了广泛的试验,以比较拟议算法的性能和状态算算算法的进度。结果显示,THUIM在运行和目的上,可以很好地进行。