Privacy-preserving process mining enables the analysis of business processes using event logs, while giving guarantees on the protection of sensitive information on process stakeholders. To this end, existing approaches add noise to the results of queries that extract properties of an event log, such as the frequency distribution of trace variants, for analysis.Noise insertion neglects the semantics of the process, though, and may generate traces not present in the original log. This is problematic. It lowers the utility of the published data and makes noise easily identifiable, as some traces will violate well-known semantic constraints.In this paper, we therefore argue for privacy preservation that incorporates a process semantics. For common trace-variant queries, we show how, based on the exponential mechanism, semantic constraints are incorporated to ensure differential privacy of the query result. Experiments demonstrate that our semantics-aware anonymization yields event logs of significantly higher utility than existing approaches.
翻译:使用事件日志对业务流程进行隐私保护过程的采矿使得能够对业务流程进行分析,同时保证保护关于过程利害相关者的敏感信息。为此,现有方法在提取事件日志特性的查询结果(如跟踪变异物的频率分布)中增加噪音,以供分析。 插入噪音忽略了过程的语义, 并可能产生原始日志中不存在的痕迹。 这有问题。 它降低了所公布数据的效用, 并使噪音容易识别, 因为有些痕迹将违反众所周知的语义限制。 因此, 在本文中, 我们主张保护隐私, 包含一个过程语义限制。 对于常见的跟踪变异质查询, 我们根据指数机制, 说明如何纳入语义限制以确保查询结果的隐私差异。 实验表明,我们的语义识别匿名生成事件日志的效用比现有方法要高得多。