Process mining is rapidly growing in the industry. Consequently, privacy concerns regarding sensitive and private information included in event data, used by process mining algorithms, are becoming increasingly relevant. State-of-the-art research mainly focuses on providing privacy guarantees, e.g., differential privacy, for trace variants that are used by the main process mining techniques, e.g., process discovery. However, privacy preservation techniques for releasing trace variants still do not fulfill all the requirements of industry-scale usage. Moreover, providing privacy guarantees when there exists a high rate of infrequent trace variants is still a challenge. In this paper, we introduce TraVaG as a new approach for releasing differentially private trace variants based on \text{Generative Adversarial Networks} (GANs) that provides industry-scale benefits and enhances the level of privacy guarantees when there exists a high ratio of infrequent variants. Moreover, TraVaG overcomes shortcomings of conventional privacy preservation techniques such as bounding the length of variants and introducing fake variants. Experimental results on real-life event data show that our approach outperforms state-of-the-art techniques in terms of privacy guarantees, plain data utility preservation, and result utility preservation.
翻译:过程挖掘在工业领域中得到越来越广泛的应用。因此,有关包含事件数据的敏感和私人信息的隐私问题变得越来越重要,这些数据将被过程挖掘算法使用。目前的研究主要关注为主要的过程挖掘技术(例如过程发现)使用的跟踪变体提供隐私保证,例如差分隐私。然而,发布跟踪变体的隐私保护技术仍无法满足工业级使用的所有要求。此外,在存在高比例不频繁变体时提供隐私保证仍然是一个挑战。在本文中,我们介绍了一种新的基于生成对抗网络(GAN)的差分隐私跟踪变体发布方法TraVaG,它提供了工业规模的优势,并在存在高比例不频繁变体时提高了隐私保证水平。此外,TraVaG克服了传统隐私保护技术的缺点,例如限制变体长度和引入虚假变体。基于实际事件数据的实验结果表明,我们的方法在隐私保证、原始数据效用保留和结果效用保留方面优于现有技术。