We propose a neural audio generative model, MDCTNet, operating in the perceptually weighted domain of an adaptive modified discrete cosine transform (MDCT). The architecture of the model captures correlations in both time and frequency directions with recurrent layers (RNNs). An audio coding system is obtained by training MDCTNet on a diverse set of fullband monophonic audio signals at 48 kHz sampling, conditioned by a perceptual audio encoder. In a subjective listening test with ten excerpts chosen to be balanced across content types, yet stressful for both codecs, the mean performance of the proposed system for 24 kb/s variable bitrate (VBR) is similar to that of Opus at twice the bitrate.
翻译:我们建议采用神经音频基因模型MDCTNet(MDCTNet),该模型在适应性调整的离散共弦变形(MDCT)的感官加权域内运行。该模型的架构在时间和频率方向上都捕捉了与经常层(RNNs)的关联性。通过对MDCTNet进行关于48千赫兹抽样的一套全频单声频信号的培训,获得了一个音频编码系统。在一次主观听觉测试中,有十节节节内容选择在内容类型之间保持平衡,但对于两个编码器来说压力很大。 24千b/s可变比特率(VBR)的拟议系统的平均性能与比特率(Opus)的两倍相似。