Process mining enables the analysis of complex systems using event data recorded during the execution of processes. Specifically, models of these processes can be discovered from event logs, i.e., sequences of events. However, the recorded events are often too fine-granular and result in unstructured models that are not meaningful for analysis. Log abstraction therefore aims to group together events to obtain a higher-level representation of the event sequences. While such a transformation shall be driven by the analysis goal, existing techniques force users to define how the abstraction is done, rather than what the result shall be. In this paper, we propose GECCO, an approach for log abstraction that enables users to impose requirements on the resulting log in terms of constraints. GECCO then groups events so that the constraints are satisfied and the distance to the original log is minimized. Since exhaustive log abstraction suffers from an exponential runtime complexity, GECCO also offers a heuristic approach guided by behavioral dependencies found in the log. We show that the abstraction quality of GECCO is superior to baseline solutions and demonstrate the relevance of considering constraints during log abstraction in real-life settings.
翻译:具体地说,这些过程的模型可以从事件日志(即事件的顺序)中发现。然而,所记录的事件往往过于细微,导致没有结构的模型,因此,测图抽象化的目的是将事件组合在一起,以获得事件序列的更高层次的描述。虽然这种转变应该由分析目标驱动,但现有技术迫使用户确定如何进行抽象化,而不是结果如何。在本文中,我们建议GECCO,这是一种使用户能够对由此产生的日志提出限制要求的日志抽象化方法。GECCO然后将事件分组,以便满足限制,并尽可能缩小原始日志的距离。由于全部日志的抽象化都受到指数性运行复杂性的影响,GECCO还提供了一种由日志中发现的行为依赖性所引导的超自然方法。我们表明,GECCO的抽象化质量优于基线解决方案,并表明在现实生活中的日志抽象化过程中考虑限制的相关性。