隐匿于噪声之中：通过潜在声学模式触发器揭示音频大语言模型对齐中的后门 (Hidden in the Noise: Unveiling Backdoors in Audio LLMs Alignment through Latent Acoustic Pattern Triggers)

As Audio Large Language Models (ALLMs) emerge as powerful tools for speech processing, their safety implications demand urgent attention. While considerable research has explored textual and vision safety, audio's distinct characteristics present significant challenges. This paper first investigates: Is ALLM vulnerable to backdoor attacks exploiting acoustic triggers? In response to this issue, we introduce Hidden in the Noise (HIN), a novel backdoor attack framework designed to exploit subtle, audio-specific features. HIN applies acoustic modifications to raw audio waveforms, such as alterations to temporal dynamics and strategic injection of spectrally tailored noise. These changes introduce consistent patterns that an ALLM's acoustic feature encoder captures, embedding robust triggers within the audio stream. To evaluate ALLM robustness against audio-feature-based triggers, we develop the AudioSafe benchmark, assessing nine distinct risk types. Extensive experiments on AudioSafe and three established safety datasets reveal critical vulnerabilities in existing ALLMs: (I) audio features like environment noise and speech rate variations achieve over 90% average attack success rate. (II) ALLMs exhibit significant sensitivity differences across acoustic features, particularly showing minimal response to volume as a trigger, and (III) poisoned sample inclusion causes only marginal loss curve fluctuations, highlighting the attack's stealth.

翻译：随着音频大语言模型（ALLMs）作为强大的语音处理工具崭露头角，其安全性问题亟需关注。尽管已有大量研究探索了文本和视觉领域的安全性，但音频的独特特性带来了显著挑战。本文首先探究：ALLM是否容易受到利用声学触发器的后门攻击？针对这一问题，我们提出了‘隐匿于噪声’（HIN），一种新颖的后门攻击框架，旨在利用微妙且音频特有的特征。HIN对原始音频波形施加声学修改，例如改变时间动态特性，以及策略性地注入频谱定制的噪声。这些变化引入了ALLM的声学特征编码器能够捕获的一致模式，从而在音频流中嵌入鲁棒的触发器。为了评估ALLM对基于音频特征的触发器的鲁棒性，我们开发了AudioSafe基准测试，评估九种不同的风险类型。在AudioSafe和三个已建立的安全数据集上进行的大量实验揭示了现有ALLMs的关键漏洞：（I）环境噪声和语速变化等音频特征实现了超过90%的平均攻击成功率；（II）ALLMs在不同声学特征上表现出显著的敏感性差异，特别是对音量作为触发器的响应极小；（III）包含中毒样本仅导致损失曲线出现微小波动，突显了攻击的隐蔽性。