Prescriptive Process Monitoring is a prominent problem in Process Mining, which consists in identifying a set of actions to be recommended with the goal of optimising a target measure of interest or Key Performance Indicator (KPI). One challenge that makes this problem difficult is the need to provide Prescriptive Process Monitoring techniques only based on temporally annotated (process) execution data, stored in, so-called execution logs, due to the lack of well crafted and human validated explicit models. In this paper we aim at proposing an AI based approach that learns, by means of Reinforcement Learning (RL), an optimal policy (almost) only from the observation of past executions and recommends the best activities to carry on for optimizing a KPI of interest. This is achieved first by learning a Markov Decision Process for the specific KPIs from data, and then by using RL training to learn the optimal policy. The approach is validated on real and synthetic datasets and compared with off-policy Deep RL approaches. The ability of our approach to compare with, and often overcome, Deep RL approaches provides a contribution towards the exploitation of white box RL techniques in scenarios where only temporal execution data are available.
翻译:摘要:处方流程监控是过程挖掘中的一个突出问题,它的目标是识别一组操作,以推荐优化目标衡量指标或关键业绩指标(KPI)。使这个问题困难的一个挑战是需要基于时间标注的(过程)执行数据进行处方流程监控技术的提供,这些数据存储在所谓的执行日志中,因为缺乏经过深入推敲和人工验证的明确模型。本文旨在提出一种基于人工智能的方法,通过增强学习(RL)学习一个最优策略(几乎只基于过去执行的观测),并推荐出最佳的活动来优化感兴趣的KPI。这是通过从数据中学习特定KPI的马尔可夫决策过程来实现的,然后使用RL训练来学习最优策略。该方法在真实和合成数据集上得到验证,并与脱机 Deep RL 方法进行比较。我们的方法能够与 Deep RL 方法比较并常常超越它们,这为在只有时间执行数据的情况下开发白盒 RL 技术贡献了一份力量。