Recent work has designed methods to demonstrate that model updates in ASR training can leak potentially sensitive attributes of the utterances used in computing the updates. In this work, we design the first method to demonstrate information leakage about training data from trained ASR models. We design Noise Masking, a fill-in-the-blank style method for extracting targeted parts of training data from trained ASR models. We demonstrate the success of Noise Masking by using it in four settings for extracting names from the LibriSpeech dataset used for training a state-of-the-art Conformer model. In particular, we show that we are able to extract the correct names from masked training utterances with 11.8% accuracy, while the model outputs some name from the train set 55.2% of the time. Further, we show that even in a setting that uses synthetic audio and partial transcripts from the test set, our method achieves 2.5% correct name accuracy (47.7% any name success rate). Lastly, we design Word Dropout, a data augmentation method that we show when used in training along with Multistyle TRaining (MTR), provides comparable utility as the baseline, along with significantly mitigating extraction via Noise Masking across the four evaluated settings.
翻译:最近的工作设计了一些方法,以证明ASR培训中的示范更新可能会泄漏计算更新时所用语句的潜在敏感属性。 在这项工作中,我们设计了第一种方法,以显示从经过培训的 ASR 模型中泄漏的培训数据信息。我们设计了“噪音遮罩”,这是从经过培训的 ASR 模型中提取培训数据的目标部分内容的一种填充式方法。我们通过在四个环境中使用“噪音遮罩”成功展示了“噪音遮罩”的成功,用它来提取用于培训最先进的“Conferent”模型的“LibriSpeech”数据集中的名称。特别是,我们展示了我们能够以11.8%的准确度从隐藏的培训语句中提取正确的名称,而模型输出出自火车设置的55.2%的时间里程。此外,我们显示,即使在使用合成音频和部分记录在测试集中,我们的方法也实现了2.5%的准确度(47.7%的任何名成功率)。最后,我们设计了“Wadropout”这一数据增强方法,我们在与多式 TRAining (MTR)一起培训时展示了数据增强方法,我们所使用的数据,在四个基线上提供了可比的利用。