Process mining studies ways to derive value from process executions recorded in event logs of IT-systems, with process discovery the task of inferring a process model for an event log emitted by some unknown system. One quality criterion for discovered process models is generalization. Generalization seeks to quantify how well the discovered model describes future executions of the system, and is perhaps the least understood quality criterion in process mining. The lack of understanding is primarily a consequence of generalization seeking to measure properties over the entire future behavior of the system, when the only available sample of behavior is that provided by the event log itself. In this paper, we draw inspiration from computational statistics, and employ a bootstrap approach to estimate properties of a population based on a sample. Specifically, we define an estimator of the model's generalization based on the event log it was discovered from, and then use bootstrapping to measure the generalization of the model with respect to the system, and its statistical significance. Experiments demonstrate the feasibility of the approach in industrial settings.
翻译:在信息技术系统事件日志中记录到的工序执行过程,从过程采矿研究中获取价值的方法,过程发现的任务是为某些未知系统排放的事件日志推断一个过程模型。发现过程模型的一个质量标准是概括化。一般化的目的是量化所发现的模式如何很好地描述系统的未来执行,或许是采矿过程中最不易理解的质量标准。缺乏了解的主要原因是,在活动日志本身提供的唯一行为样本时,试图测量系统未来整个行为模式的属性。在本文中,我们从计算统计数据中汲取灵感,并采用靴套方法根据样本估计人口属性。具体地说,我们根据从中发现的事件日志确定模型一般化的估算标准,然后用靴杆测量模型在系统方面的总体性及其统计意义。实验表明该方法在工业环境中的可行性。