The discovery of utility-driven patterns is a useful and difficult research topic. It can extract significant and interesting information from specific and varied databases, increasing the value of the services provided. In practice, the measure of utility is often used to demonstrate the importance, profit, or risk of an object or a pattern. In the database, although utility is a flexible criterion for each pattern, it is a more absolute criterion due to the neglect of utility sharing. This leads to the derived patterns only exploring partial and local knowledge from a database. Utility occupancy is a recently proposed model that considers the problem of mining with high utility but low occupancy. However, existing studies are concentrated on itemsets that do not reveal the temporal relationship of object occurrences. Therefore, this paper towards sequence utility maximization. We first define utility occupancy on sequence data and raise the problem of High Utility-Occupancy Sequential Pattern Mining (HUOSPM). Three dimensions, including frequency, utility, and occupancy, are comprehensively evaluated in HUOSPM. An algorithm called Sequence Utility Maximization with Utility occupancy measure (SUMU) is proposed. Furthermore, two data structures for storing related information about a pattern, Utility-Occupancy-List-Chain (UOL-Chain) and Utility-Occupancy-Table (UO-Table) with six associated upper bounds, are designed to improve efficiency. Empirical experiments are carried out to evaluate the novel algorithm's efficiency and effectiveness. The influence of different upper bounds and pruning strategies is analyzed and discussed. The comprehensive results suggest that the work of our algorithm is intelligent and effective.
翻译:公用驱动模式的发现是一个有用和困难的研究课题,它能够从具体和不同的数据库中提取重要和有趣的信息,增加所提供服务的价值。在实践中,公用度通常用于显示一个物体或模式的重要性、利润或风险。在数据库中,尽管公用度是每种模式的灵活标准,但由于忽视公用共享,它是一个更绝对的标准。这导致衍生模式仅探索部分和地方从数据库获得知识。公用利用率是最近提出的一种模式,它考虑到使用率高但占用率低的采矿问题。然而,现有研究集中于不显示目标发生时间关系的项目。因此,本文用于显示使用顺序最大化或模式的重要性、利润或风险。在数据库中,虽然公用量是每种模式的灵活度,但它是一个更为绝对的标准。在使用率和占用率方面,一种称为 " 后序 " 后序 " 的算算算法,与UMUMU的精确度度评估有关。此外,两种数据结构是用于储存与用户使用率相关的信息的上层和上层结构的精度。