Trace clustering has been extensively used to preprocess event logs. By grouping similar behavior, these techniques guide the identification of sub-logs, producing more understandable models and conformance analytics. Nevertheless, little attention has been posed to the relationship between event log properties and clustering quality. In this work, we propose an Automatic Machine Learning (AutoML) framework to recommend the most suitable pipeline for trace clustering given an event log, which encompasses the encoding method, clustering algorithm, and its hyperparameters. Our experiments were conducted using a thousand event logs, four encoding techniques, and three clustering methods. Results indicate that our framework sheds light on the trace clustering problem and can assist users in choosing the best pipeline considering their scenario.
翻译:跟踪群集被广泛用于处理前事件日志。 通过对类似的行为进行分组,这些技术指导子行的识别,产生更易理解的模型和符合性分析。 然而,对事件日志属性和群集质量之间的关系没有给予多少注意。 在这项工作中,我们提议了一个自动机器学习框架,以建议最合适的跟踪群集管道,给事件日志提供最合适的管道,包括编码方法、群集算法及其超参数。我们用一千个事件日志、四种编码技术和三种群集方法进行了实验。结果显示,我们的框架揭示了跟踪群集问题,可以帮助用户选择考虑到其情景的最佳管道。