On-shelf utility mining (OSUM) is an emerging research direction in data mining. It aims to discover itemsets that have high relative utility in their selling time period. Compared with traditional utility mining, OSUM can find more practical and meaningful patterns in real-life applications. However, there is a major drawback to traditional OSUM. For normal users, it is hard to define a minimum threshold minutil for mining the right amount of on-shelf high utility itemsets. On one hand, if the threshold is set too high, the number of patterns would not be enough. On the other hand, if the threshold is set too low, too many patterns will be discovered and cause an unnecessary waste of time and memory consumption. To address this issue, the user usually directly specifies a parameter k, where only the top-k high relative utility itemsets would be considered. Therefore, in this paper, we propose a generic algorithm named TOIT for mining Top-k On-shelf hIgh-utility paTterns to solve this problem. TOIT applies a novel strategy to raise the minutil based on the on-shelf datasets. Besides, two novel upper-bound strategies named subtree utility and local utility are applied to prune the search space. By adopting the strategies mentioned above, the TOIT algorithm can narrow the search space as early as possible, improve the mining efficiency, and reduce the memory consumption, so it can obtain better performance than other algorithms. A series of experiments have been conducted on real datasets with different styles to compare the effects with the state-of-the-art KOSHU algorithm. The experimental results showed that TOIT outperforms KOSHU in both running time and memory consumption.
翻译:现成的公用事业采矿( OSUM) 是数据开采的新兴研究方向。 它的目的是发现在销售期内相对效用较高的项目。 与传统的公用事业采矿相比, OSUM可以在现实生活中找到更实际和有意义的模式。 但是, 传统的OSUM 存在一个重大缺陷。 对于普通用户来说, 很难为开采现成的高公用事业项目确定一个最小阈值 。 一方面, 如果阈值定得太高, 模式的比较就不够了。 另一方面, 如果阈值定得太低, 就会发现太多的运行模式, 并且会给时间和记忆消耗造成不必要的浪费。 解决这一问题, 用户通常会直接指定一个参数 k, 只有顶级高的相对公用事业项目才会被考虑。 因此, 在本文中, 我们提出一个名为TTOIT的通用算法, 用于开采Top- k- self hIgh- intiltyality pattents 来解决这个问题。 TOIT 应用一个新颖的战略来提高最小的运行量值, 以在线的耗资战略为基础进行搜索, 也可以进行搜索。 。 高级搜索到其它的系统 。 。 在搜索中, 搜索中, 上显示, 高级的耗值可以显示, 高级的耗值的耗值的耗值的耗值的耗值的耗值的耗值的耗值的耗值的耗值的耗值的耗值的耗值可以运行到比 。