Episode discovery from an event is a popular framework for data mining tasks and has many real-world applications. An episode is a partially ordered set of objects (e.g., item, node), and each object is associated with an event type. This episode can also be considered as a complex event sub-sequence. High-utility episode mining is an interesting utility-driven mining task in the real world. Traditional episode mining algorithms, by setting a threshold, usually return a huge episode that is neither intuitive nor saves time. In general, finding a suitable threshold in a pattern-mining algorithm is a trivial and time-consuming task. In this paper, we propose a novel algorithm, called Top-K High Utility Episode (THUE) mining within the complex event sequence, which redefines the previous mining task by obtaining the K highest episodes. We introduce several threshold-raising strategies and optimize the episode-weighted utilization upper bounds to speed up the mining process and effectively reduce the memory cost. Finally, the experimental results on both real-life and synthetic datasets reveal that the THUE algorithm can offer six to eight orders of magnitude running time performance improvement over the state-of-the-art algorithm and has low memory consumption.
翻译:事件后发现事件是一个热门的数据采矿任务框架, 并有许多真实世界应用。 事件是一个部分定序的物体集( 例如, 项, 节点), 每个对象都与事件类型相关。 这个事件集也可以被视为一个复杂的事件次序列。 高功率事件集的采矿是现实世界中一项有趣的公用事业驱动的采矿任务。 传统事件集成算法, 通过设定一个阈值, 通常返回一个既不直观又不会节省时间的大型事件集。 一般来说, 在模式采掘算法中找到一个合适的阈值是一件琐碎而耗时的任务。 在本文中, 我们提出一个新颖的算法, 叫做 Top- K 高Upility Episode (THUE), 在复杂事件序列中重新定义先前的采矿任务, 通过获得 K 最高序号。 我们引入了几个阈值战略, 并优化集重的利用上限, 以加快采矿进程, 有效降低记忆成本。 最后, 真实和合成数据集的实验结果显示, 真实生命和合成数据集的实验性数据集可以使THHUE- 的记忆质量得到八级的改进。