鲁棒无失真水印用于自回归音频生成模型 (Robust Distortion-Free Watermark for Autoregressive Audio Generation Models)

The rapid advancement of next-token-prediction models has led to widespread adoption across modalities, enabling the creation of realistic synthetic media. In the audio domain, while autoregressive speech models have propelled conversational interactions forward, the potential for misuse, such as impersonation in phishing schemes or crafting misleading speech recordings, has also increased. Security measures such as watermarking have thus become essential to ensuring the authenticity of digital media. Traditional statistical watermarking methods used for autoregressive language models face challenges when applied to autoregressive audio models, due to the inevitable ``retokenization mismatch'' - the discrepancy between original and retokenized discrete audio token sequences. To address this, we introduce Aligned-IS, a novel, distortion-free watermark, specifically crafted for audio generation models. This technique utilizes a clustering approach that treats tokens within the same cluster equivalently, effectively countering the retokenization mismatch issue. Our comprehensive testing on prevalent audio generation platforms demonstrates that Aligned-IS not only preserves the quality of generated audio but also significantly improves the watermark detectability compared to the state-of-the-art distortion-free watermarking adaptations, establishing a new benchmark in secure audio technology applications.

翻译：下一代令牌预测模型的快速发展已导致其在多模态领域的广泛应用，使得生成逼真的合成媒体成为可能。在音频领域，尽管自回归语音模型推动了对话交互的进步，但滥用风险（例如在钓鱼方案中进行身份冒充或制作误导性语音录音）也随之增加。因此，水印等安全措施对于确保数字媒体的真实性变得至关重要。用于自回归语言模型的传统统计水印方法在应用于自回归音频模型时面临挑战，这源于不可避免的“重令牌化失配”——即原始离散音频令牌序列与重令牌化序列之间的差异。为解决这一问题，我们提出了Aligned-IS，一种专为音频生成模型设计的新型无失真水印技术。该方法采用聚类策略，将同一簇内的令牌视为等效，从而有效应对重令牌化失配问题。我们在主流音频生成平台上进行的全面测试表明，与当前最先进的无失真水印适配方案相比，Aligned-IS不仅保持了生成音频的质量，还显著提升了水印的可检测性，为安全音频技术应用树立了新的基准。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日