Recent work has designed methods to demonstrate that model updates in ASR training can leak potentially sensitive attributes of the utterances used in computing the updates. In this work, we design the first method to demonstrate information leakage about training data from trained ASR models. We design Noise Masking, a fill-in-the-blank style method for extracting targeted parts of training data from trained ASR models. We demonstrate the success of Noise Masking by using it in four settings for extracting names from the LibriSpeech dataset used for training a SOTA Conformer model. In particular, we show that we are able to extract the correct names from masked training utterances with 11.8% accuracy, while the model outputs some name from the train set 55.2% of the time. Further, we show that even in a setting that uses synthetic audio and partial transcripts from the test set, our method achieves 2.5% correct name accuracy (47.7% any name success rate). Lastly, we design Word Dropout, a data augmentation method that we show when used in training along with MTR, provides comparable utility as the baseline, along with significantly mitigating extraction via Noise Masking across the four evaluated settings.
翻译:最近的工作设计了一些方法,以证明ASR培训中的示范更新可能会泄漏计算更新时所用语句的潜在敏感属性。 在这项工作中,我们设计了第一种方法,以显示从经过培训的 ASR 模型中泄漏的培训数据信息。我们设计了“噪音遮罩”方法,这是从经过培训的 ASR 模型中提取培训数据的目标部分。我们用它来在四个环境中提取用于培训SOTA Conexer 模型的LibriSpeech数据集中的名称,以此来证明噪音遮罩成功。特别是,我们证明我们能够以11.8%的精确度从隐藏的培训语句中提取正确的名称,而模型输出出来自火车设置的55.2%的时间的某个名称。此外,我们表明,即使在使用合成音频和测试集部分记录的情况下,我们的方法也实现了2.5%的正确名称准确度(47.7%的任何名称成功率)。最后,我们设计了Word Debout,这是一种数据增强方法,我们用来在培训中显示在四大评估中同时显示的数据增强方法,提供了可比的基线效用,同时通过 Noisemaismain mask making mask se 。