Utility mining has emerged as an important and interesting topic owing to its wide application and considerable popularity. However, conventional utility mining methods have a bias toward items that have longer on-shelf time as they have a greater chance to generate a high utility. To eliminate the bias, the problem of on-shelf utility mining (OSUM) is introduced. In this paper, we focus on the task of OSUM of sequence data, where the sequential database is divided into several partitions according to time periods and items are associated with utilities and several on-shelf time periods. To address the problem, we propose two methods, OSUM of sequence data (OSUMS) and OSUMS+, to extract on-shelf high-utility sequential patterns. For further efficiency, we also designed several strategies to reduce the search space and avoid redundant calculation with two upper bounds time prefix extension utility (TPEU) and time reduced sequence utility (TRSU). In addition, two novel data structures were developed for facilitating the calculation of upper bounds and utilities. Substantial experimental results on certain real and synthetic datasets show that the two methods outperform the state-of-the-art algorithm. In conclusion, OSUMS may consume a large amount of memory and is unsuitable for cases with limited memory, while OSUMS+ has wider real-life applications owing to its high efficiency.
翻译:由于应用范围广泛和广受欢迎,公用事业采矿已成为一个重要和有趣的专题,然而,传统公用事业采矿方法偏向于那些时间较长的现成项目,因为它们更有可能产生高效用。为了消除偏见,引入了现成公用事业采矿(OSUM)问题。在本文件中,我们侧重于OSUM的序列数据任务,根据时间段将顺序数据库分为几个分区,项目与公用事业有关,有些项目与现成时间段有关。为解决这一问题,我们提出了两种方法,即序列数据OSUM(OSUMS)和OSUMS+,以提取现成高效用高效用连续模式。为了进一步提高效率,我们还设计了若干战略,以减少搜索空间,避免用两个上下限时间段的扩展效用(TPEU)和时间缩短的序列效用(TRSU)进行重复计算。此外,为了便于计算上限界限和公用事业,还开发了两个新的数据结构。关于某些真实和合成数据集的大量实验结果显示,两种方法已经超越了OSUS-MS的大规模记忆应用,而O-SU-SU的大规模记忆-imal-laimal-laim a a laim laim laim laim acal-st-st-stal lax last-listal macal lax last-im lax latistal lax lax lax latist-st-st-st-stal latist-st-stal latistal latistal latical laticalticalticalticaltical latical latical latical lax lax ladal latical max acal acal lati latical ladal ladal latical ladal ladal ladal lad ladal ladal ladal ladal ladal ladal ladal ladal ladal ladal ladal ladal ladal ladal ladal ladal ladal ladal ladal