Process mining deals with extraction of knowledge from business process execution logs. Traditional process mining tasks, like process model generation or conformance checking, rely on a minimalistic feature set where each event is characterized only by its case identifier, activity type, and timestamp. In contrast, the success of modern machine learning is based on models that take any available data as direct input and build layers of features automatically during training. In this work, we introduce ProcK (Process & Knowledge), a novel pipeline to build business process prediction models that take into account both sequential data in the form of event logs and rich semantic information represented in a graph-structured knowledge base. The hybrid approach enables ProcK to flexibly make use of all information residing in the databases of organizations. Components to extract inter-linked event logs and knowledge bases from relational databases are part of the pipeline. We demonstrate the power of ProcK by training it for prediction tasks on the OULAD e-learning dataset, where we achieve state-of-the-art performance on the tasks of predicting student dropout from courses and predicting their success. We also apply our method on a number of additional machine learning tasks, including exam score prediction and early predictions that only take into account data recorded during the first weeks of the courses.
翻译:传统流程采矿任务,如流程模型生成或合规性检查,都依赖于一个最起码的特征,其中每个事件的特点仅以其案件识别特征、活动类型和时间戳为特征。相比之下,现代机器学习的成功是基于模型,这些模型将任何现有数据作为直接投入,并在培训期间自动建立多层特征。在这项工作中,我们引入了ProcK(处理和知识),这是一条新型管道,用于构建业务流程预测模型,其中既考虑到以事件日志为形式的连续数据,也考虑到图表结构知识库中体现的丰富的语义信息。混合方法使ProcK能够灵活地利用各组织数据库中的所有信息。从关系数据库中提取相互关联的事件日志和知识库的组件是管道的一部分。我们通过培训它预测OULAD电子学习数据集的任务,我们在那里取得了预测学生辍学和预测其成功情况的先进业绩。我们还将我们所记录的数据预测方法运用于数周的早期机器学习课程的预测。我们还把记录的数据记录用于了新算数的模型,包括数周内的数据预测。