Finding high-importance patterns in data is an emerging data mining task known as High-utility itemset mining (HUIM). Given a minimum utility threshold, a HUIM algorithm extracts all the high-utility itemsets (HUIs) whose utility values are not less than the threshold. This can reveal a wealth of useful information, but the precise needs of users are not well taken into account. In particular, users often want to focus on patterns that have some specific items rather than find all patterns. To overcome that difficulty, targeted mining has emerged, focusing on user preferences, but only preliminary work has been conducted. For example, the targeted high-utility itemset querying algorithm (TargetUM) was proposed, which uses a lexicographic tree to query itemsets containing a target pattern. However, selecting the minimum utility threshold is difficult when the user is not familiar with the processed database. As a solution, this paper formulates the task of targeted mining of the top-k high-utility itemsets and proposes an efficient algorithm called TMKU based on the TargetUM algorithm to discover the top-k target high-utility itemsets (top-k THUIs). At the same time, several pruning strategies are used to reduce memory consumption and execution time. Extensive experiments show that the proposed TMKU algorithm has good performance on real and synthetic datasets.
翻译:在数据中找到高重要性的模式是一种新兴的数据挖掘任务,称为高效用项集挖掘(HUIM)。给定一个最小效用阈值,HUIM算法提取所有效用值不低于阈值的高效用项集(HUIs)。这可以揭示大量有用的信息,但用户的具体需求考虑得不充分。特别是,用户往往希望集中于具有某些特定项的模式,而不是找出所有的模式。为了克服这个困难,出现了针对性的挖掘,并且只进行了初步的工作。例如,提出了一种名为TargetUM的针对性高效用项集查询算法,它使用词典树来查询包含目标模式的项集。但是,当用户不熟悉处理的数据库时,选择最小的效用阈值是困难的。作为解决方案,本文提出了目标挖掘前 k 高效用项集(top-k THUIs)的任务,并提出了一种高效的算法TMKU,基于TargetUM算法来发现top-k目标高效用项集。与此同时,采用多种剪枝策略来降低内存消耗和执行时间。广泛的实验证明,所提出的TMKU算法在真实和合成数据集上具有良好的性能。