Neural speech codecs have achieved strong performance in low-bitrate compression, but residual vector quantization (RVQ) often suffers from unstable training and ineffective decomposition, limiting reconstruction quality and efficiency. We propose PURE Codec (Progressive Unfolding of Residual Entropy), a novel framework that guides multi-stage quantization using a pre-trained speech enhancement model. The first quantization stage reconstructs low-entropy, denoised speech embeddings, while subsequent stages encode residual high-entropy components. This design improves training stability significantly. Experiments demonstrate that PURE consistently outperforms conventional RVQ-based codecs in reconstruction and downstream speech language model-based text-to-speech, particularly under noisy training conditions.
翻译:神经语音编解码器在低比特率压缩中已取得优异性能,但残差向量量化(RVQ)常面临训练不稳定和分解效率低下的问题,限制了重建质量与效率。我们提出PURE Codec(渐进式残差熵展开),这是一种利用预训练语音增强模型引导多阶段量化的新型框架。首阶段量化重建低熵、去噪的语音嵌入,后续阶段则编码残差的高熵成分。该设计显著提升了训练稳定性。实验表明,PURE在重建性能及基于语音语言模型的下游文本到语音任务中持续优于传统基于RVQ的编解码器,尤其在含噪声的训练条件下表现更为突出。