Process mining techniques enable analysts to identify and assess process improvement opportunities based on event logs. A common roadblock to process mining is that event logs may contain private information that cannot be used for analysis without consent. An approach to overcome this roadblock is to anonymize the event log so that no individual represented in the original log can be singled out based on the anonymized one. Differential privacy is an anonymization approach that provides this guarantee. A differentially private event log anonymization technique seeks to produce an anonymized log that is as similar as possible to the original one (high utility) while providing a required privacy guarantee. Existing event log anonymization techniques operate by injecting noise into the traces in the log (e.g., duplicating, perturbing, or filtering out some traces). Recent work on differential privacy has shown that a better privacy-utility tradeoff can be achieved by applying subsampling prior to noise injection. In other words, subsampling amplifies privacy. This paper proposes an event log anonymization approach called Libra that exploits this observation. Libra extracts multiple samples of traces from a log, independently injects noise, retains statistically relevant traces from each sample, and composes the samples to produce a differentially private log. An empirical evaluation shows that the proposed approach leads to a considerably higher utility for equivalent privacy guarantees relative to existing baselines.
翻译:采矿技术使分析家能够根据事件日志确定和评估流程改进机会。处理采矿的一个常见障碍是,事件日志可能包含私人信息,未经同意不得用于分析。克服这一障碍的一个办法是,对事件日志进行匿名化,以便根据匿名记录单独列出原始日志中所代表的个人。不同的隐私是一种匿名化方法,提供这一保障。一种差异性私人事件日志匿名化技术试图产生一种尽可能类似于原始日志(高用途)的匿名化日志,同时提供必要的隐私保障。现有的事件日志匿名化技术通过将噪音注入日志的痕迹(例如,复制、扰动或过滤一些痕迹)运作。最近关于差异性隐私的工作表明,通过在注入噪音之前应用子取样,可以实现更好的隐私-效用交易。换句话说,亚光化增强隐私。本文提议了一种事件日志化方法,称为利布拉,利用这一观测。 现有事件日志上的匿名化技术,通过将噪音注入更高比例的噪音进行操作(例如复制、渗透或过滤某些比值的样本),独立地从统计记录中提取多种隐私的样本,以便独立地评估。