Audio Adversarial Examples (AAE) represent specially created inputs meant to trick Automatic Speech Recognition (ASR) systems into misclassification. The present work proposes MP3 compression as a means to decrease the impact of Adversarial Noise (AN) in audio samples transcribed by ASR systems. To this end, we generated AAEs with the Fast Gradient Sign Method for an end-to-end, hybrid CTC-attention ASR system. Our method is then validated by two objective indicators: (1) Character Error Rates (CER) that measure the speech decoding performance of four ASR models trained on uncompressed, as well as MP3-compressed data sets and (2) Signal-to-Noise Ratio (SNR) estimated for both uncompressed and MP3-compressed AAEs that are reconstructed in the time domain by feature inversion. We found that MP3 compression applied to AAEs indeed reduces the CER when compared to uncompressed AAEs. Moreover, feature-inverted (reconstructed) AAEs had significantly higher SNRs after MP3 compression, indicating that AN was reduced. In contrast to AN, MP3 compression applied to utterances augmented with regular noise resulted in more transcription errors, giving further evidence that MP3 encoding is effective in diminishing only AN.
翻译:音频反动示例( AAE) 代表了专门创建的投入, 旨在将自动语音识别( ASR) 系统诱使自动语音识别( ASR) 系统进行错误分类。 目前的工作建议 MP3 压缩 MP3 以降低 ASR 系统录音样本中Aversarial噪音( AN) 的影响。 为此, 我们用快速渐进信号方法生成了终端到终端、 混合的 CTS- 注意 ASR 系统。 然后, 我们的方法通过两个客观指标得到验证:(1) 字符错误率( CER), 用来测量4个接受过未压缩培训的 ASR 模型以及 MP3 压缩数据集的语音解码性能。 (2) 用于不压缩和MP3 压缩的AE 的信号到噪音比率( SNRI) 估计。 我们发现, 用于 AAE 的 MP3 压缩的功能确实降低了CER 。 此外, 特征( 重新构建) AAAAA3 和 MA3 的不断升级后, 使得 MARCR 得到更高程度的DNA 。