To enable process analysis based on an event log without compromising the privacy of individuals involved in process execution, a log may be anonymized. Such anonymization strives to transform a log so that it satisfies provable privacy guarantees, while largely maintaining its utility for process analysis. Existing techniques perform anonymization using simple, syntactic measures to identify suitable transformation operations. This way, the semantics of the activities referenced by the events in a trace are neglected, potentially leading to transformations in which events of unrelated activities are merged. To avoid this and incorporate the semantics of activities during anonymization, we propose to instead incorporate a distance measure based on feature learning. Specifically, we show how embeddings of events enable the definition of a distance measure for traces to guide event log anonymization. Our experiments with real-world data indicate that anonymization using this measure, compared to a syntactic one, yields logs that are closer to the original log in various dimensions and, hence, have higher utility for process analysis.
翻译:为了根据事件日志进行进程分析而不损害参与进程执行的个人隐私,日志可以是匿名的。这种匿名化努力转换日志,使之满足可变隐私保障,同时大体上保持其用于进程分析的实用性。现有技术使用简单的合成措施进行匿名化,以识别适当的转换操作。这样,事件在微量跟踪中引用的活动的语义被忽略,可能导致将无关活动事件合并的转变。为避免这种情况,并纳入在匿名化过程中活动的语义,我们提议采用基于特征学习的远程测量。具体地说,我们展示了各种事件的嵌入如何为事件记录匿名化提供跟踪的距离测量定义。我们用现实世界数据进行的实验表明,与合成数据相比,使用这一测量进行的匿名生成的日志在多个方面都接近原始日志,因此对进程分析具有更高的实用性。