As an important data mining technology, high utility itemset mining (HUIM) is used to find out interesting but hidden information (e.g., profit and risk). HUIM has been widely applied in many application scenarios, such as market analysis, medical detection, and web click stream analysis. However, most previous HUIM approaches often ignore the relationship between items in an itemset. Therefore, many irrelevant combinations (e.g., \{gold, apple\} and \{notebook, book\}) are discovered in HUIM. To address this limitation, many algorithms have been proposed to mine correlated high utility itemsets (CoHUIs). In this paper, we propose a novel algorithm called the Itemset Utility Maximization with Correlation Measure (CoIUM), which considers both a strong correlation and the profitable values of the items. Besides, the novel algorithm adopts a database projection mechanism to reduce the cost of database scanning. Moreover, two upper bounds and four pruning strategies are utilized to effectively prune the search space. And a concise array-based structure named utility-bin is used to calculate and store the adopted upper bounds in linear time and space. Finally, extensive experimental results on dense and sparse datasets demonstrate that CoIUM significantly outperforms the state-of-the-art algorithms in terms of runtime and memory consumption.
翻译:作为一个重要的数据采矿技术,高用途物品集采矿(HUIM)被用于发现有趣的但隐藏的信息(如利润和风险)。HUIM已被广泛应用于许多应用情景,如市场分析、医学检测和网络点击流分析。然而,大多数以往的HUIM方法往往忽略了物品集中各项目之间的关系。因此,许多不相关的组合(如 ⁇ gold、苹果和 ⁇ 笔记本、书等)在HUIM中被发现。为了应对这一限制,许多算法被推荐给矿藏相关高用途物品(COHIs)。在本文中,我们提出了一种叫作“用关联度措施实现物品集成化”的新型算法(COIUM),它既认为物品具有很强的关联性,又认为物品的有利价值。此外,新算法采用了一个数据库预测机制来降低数据库扫描成本。此外,还利用了两个上下限和四个支线战略来有效地平缓搜索空间。一个名为“工具-bin”的简明阵列结构,用来计算和储存所采用的上层内装的上限,在直线式时间和空间中,不断显示不断的实验时程的数据结果。最后,大量的实验时程。